LNAI 2680 



Patrick Blackburn 
Chiara Ghidini 
Roy M. Turner 
Fausto Giunchiglia (Eds.) 



Modeling 
and Using Context 

4th International and Interdisciplinary Conference 
CONTEXT 2003 

Stanford, CA, USA, June 2003, Proceedings 




Lecture Notes in Artificial Intelligence 2680 

Edited by J. G. Carbonell and J. Siekmann 
Subseries of Lecture Notes in Computer Science 




Springer 

Berlin 
Heidelberg 
New York 
Hong Kong 
London 
Milan 
Paris 
Tokyo 




Patrick Blackburn Chiara Ghidini 
Roy M. Turner Fausto Giunchiglia (Eds.) 



Modeling 

and U sing Context 



4th International and Interdisciplinary Conference 
CONTEXT 2003 

Stanford, CA, USA, June 23-25, 2003 
Proceedings 




Springer 




Series Editors 



Jaime G. Carbonell, Carnegie Mellon University, Pittsburgh, PA, USA 
Jorg Siekmann, University of Saarland, Saarbrucken, Germany 

Volume Editors 

Patrick Blackburn 
INRIA Lorraine 

615, rue du Jardin Botanique, 54602 Villers les Nancy Cedex, France 
E-mail: patrick@aplog.org 

Chiara Ghidini 

University of Liverpool, Department of Computer Science 
Chadwick Building, Peach Street, Liverpool L69 7ZF, UK 
E-mail: chiara@csc.liv.ac.uk 

Roy M. Turner 

University of Maine, Department of Computer Science 
5752 Neville Hall, Orono, ME 04469-5752, USA 
rmt @ umcs . maine . edu 

Fausto Giunchiglia 

University of Trento, Department of Information and Communication Technology 

38050 Povo, Trento, Italy 

E-mail: Fausto.Giunchiglia@unitn.it 

Cataloging-in-Publication Data applied for 

A catalog record for this book is available from the Library of Congress. 

Bibliographic information published by Die Deutsche Bibliothek 

Die Deutsche Bibliothek lists this publication in the Deutsche Nationalbibliografie; 

detailed bibliographic data is available in the Internet at <http://dnb.ddb.de>. 

CR Subject Classification (1998): 1.2, F.4.1, J.3, J.4 
ISSN 0302-9743 

ISBN 3-540-40380-9 Springer- Verlag Berlin Heidelberg New York 



This work is subject to copyright. All rights are reserved, whether the whole or part of the material is 
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, 
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication 
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, 
in its current version, and permission for use must always be obtained from Springer- Verlag. Violations are 
liable for prosecution under the German Copyright Law. 

Springer- Verlag Berlin Heidelberg New York 
a member of BertelsmannSpringer Science-i-Business Media GmbH 

http://www.springer.de 

© Springer-Verlag Berlin Heidelberg 2003 
Printed in Germany 

Typesetting: Camera-ready by author, data conversion by PTP-Berlin GmbH, Heidelberg 
Printed on acid-free paper SPIN: 10927588 06/3142 5 43 2 1 0 




Preface 



Whether you are a computer scientist, a logician, a philosopher, or a psychologist, 
it is crucial to understand the role that context and contextual information 
plays in reasoning and representation. The conference at which the papers in 
this volume were presented was the fourth in an international series devoted to 
research on context, and was held in Stanford (USA) on June 23-25, 2003. The 
first conference in the series was held in Rio de Janeiro (Brazil) in 1997, the 
second was held in Trento (Italy) in 1999, and the third was held in Dundee 
(Scotland, UK) in 2001. 

CONTEXT 2003 brought together representative work from many different 
fields: in this volume you will find philosophical theorizing, logical formalization, 
computational modelling — and, indeed, computational applications — together 
with work that approaches context from a more cognitive orientation. While we 
don’t believe that this volume can capture the lively flavor of discussion of the 
conference itself, we do hope that researchers interested in context (in any of 
its many manifestations) will find something of interest here, perhaps something 
that will inspire new lines of work. 

We are very grateful to our invited speakers: Patrick Brezillon (University 
of Paris VI, France), Keith Devlin (CSLI, Stanford), and David Leake (Indiana 
University, USA) for presenting three important contemporary perspectives on 
the study on context. 

Special thanks are due to the program committee and the additional re- 
viewers (listed below) who shouldered a heavy load and stuck to an exacting 
reviewing schedule. Reviewing for an interdisciplinary conference is never easy, 
and the CONTEXT series, because of the extraordinarily broad nature of the 
topic, imposes especially heavy demands. The program committee and additio- 
nal reviewers rose to the challenge splendidly, and we are extremely grateful for 
their efforts. 

Last but not least, we would like to thank all those people who made CON- 
TEXT 2003 happen at Stanford itself. So thank you to Dikran Karagueuzian 
(who chaired the local arrangements committee) and to Keith Devlin, Michele 
King, John Perry, and Elisabetta Zibetti. Finally, special thanks to Roberta 
Ferrario, who handled the CONTEXT 2003 publicity. 
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Presupposition Incorporation in Adverbial 
Quantification 



David Ahn 

University of Rochester, Rochester, NY 14627, USA, 
http : //www . cs .rochester . edu/"davidaQin 



Abstract. In this paper, we present a critique of the uniform treatment 
of nominal and adverbial quantification in van der Sandt’s binding the- 
ory of presuppositions. We develop an alternative account of adverbial 
quantification framed in terms of Beaver’s reconstruction of Karttunen’s 
satisfaction theory. This account provides a simpler characterization of 
adverbial quantification that does not require recourse to accommoda- 
tion to explain the basic facts regarding presupposition incorporation. 



1 Introduction 

Natural language quantifiers presuppose their domains of quantification. Or, al- 
ternatively, natural language quantifier domains are anaphoric. Whichever way 
you look at it, it is uncontroversial that the domain of a natural language quan- 
tifier depends on the context in which it is used. Consider the discourse (P. 



(1) There are fifteen boys and fifteen girls at a boarding school. Five girls and 
two boys are day-pupils, though, so only ten girls and thirteen boys live in 
the dormitory. As the dormitory has ten rooms on each of two floors, every 
girl has a room to herself, but some boys have to share. 

The domains of the quantified noun phrases every girl and some boys are re- 
stricted by the context to those girls and boys living in the dormitory, even 
though there are explicitly other girls and boys in our domain of discourse. 

Van der Sandt’s binding theory of presuppositions P|, together with the as- 
sumption that quantifiers presuppose their domains, provides a straightforward 
account of the contextual restriction of quantifier domains. However, Beaver Q 
points out that the binding theory incorrectly predicts that a quantifier domain 
can be restricted through accommodation of presuppositions in the scope of the 
quantifier. Thus, the following sentences are predicted to be equivalent. 

(2) a. #Every German loves his kangaroo. 

b. Every German who has a kangaroo loves his kangaroo. 

Geurts and van der Sandt Q present a revised version of the binding theory 
which accounts for the difference between these two sentences by giving up on 
the possibility of determining the domain of a quantifier through intermediate 
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accommodation. However, as Ahn points out, accommodation of scopal 
presuppositions appears to be crucial in the interpretation of quantificational 
adverbs (qadverbs). For example, in the discourse |3), it is the presupposition of 
the verb beat that determines the domain of the qadverbs usually and always. 

(3) Marvin and John play various games together. Marvin has spent a lifetime 

practicing racket sports, but John is simply the superior athlete. John usually 

beats Marvin at badminton. He always beats him at tennis. 

In this paper, we present an alternative account of the semantics of qadverbs 
which provides for incorporation of presuppositions into qadverb domains with- 
out accommodation. Our account is based on Beaver’s reconstruction ^ of the 
satisfaction theory of Karttunen [3{ and relies on the treatment of contexts as 
sets of possible worlds and possible worlds, in turn, as composed of situations. 

2 The Binding Theory and Quantification 

The presuppositions of a sentence are those propositions that the sentence seems 
to take for granted. Presuppostitions are triggered by a wide variety of linguistic 
phenomena, most notably definite descriptions (which presuppose the existence 
and uniqueness of an entity satisfying the description) and factive predicates 
(which presuppose their propositional complements), but also clefts, selectional 
restrictions, adverbial clauses, aspectual verbs, iteratives, quantifiers, and so on. 

Presuppositions are more robust than other entailments of a sentence. They 
are usually preserved when the sentence that carries them is embedded under 
negation or modality, for example. One of the most perplexing problems regard- 
ing presuppositions is the projection problem — the problem of accounting for 
their behavior under various embeddings. In this section, we present one widely 
accepted theory that attempts to solve the projection problem. 



2.1 The Standard Binding Account 

Van der Sandt frames his account in Discourse Representation Theory (DRT) 
|B|. A discourse is represented by a Discourse Representation Structure (DRS) 
— a pair of a universe (a set of discourse referents) and a set of conditions 
(atomic predications or complex formulas built out of other DRSs). We notate 
a DRS by enclosing it in square brackets with a vertical line separating the 
universe from the conditions. A presupposition is represented as an underlined 
DRS which is initially placed in the DRS in which its trigger is represented. For 
example, John’s son in the discourse l l^l triggers the presupposition that there 
is an individual who is John’s son. This presupposition is represented by the 
innermost DRS in (EEt . 

(4) a. John does not have a daughter. If John has a son, John’s son is bald. 

-I [D j daughter(D, J)] , 

bald(A), [A json(A, J)] 1 



0. 



J 



[S' I son(S, J) ] 
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Van der Sandt’s basic insight is that presuppositions behave like discourse 
anaphors (i.e. pronouns), in that they must be bound to an antecedent. DRT 
provides a structural notion of accessibility that constrains a pronoun’s search 
for an antecedent. On van der Sandt’s account, the same notion of accessibility 
constrains presuppositions, as well. Basically, a potential antecedent is accessible 
to an anaphor if it is introduced in a DRS that contains the anaphor or that is 
the antecedent of a conditional whose consequent contains the anaphor. Thus, in 
the DRS (EEJ, referents J and S, but not D, are accessible to the presupposition. 

A pronoun or presupposition is bound to an antecedent by replacing the 
discourse marker corresponding to the pronoun or presupposition with the an- 
tecedent discourse marker and, in the case of a presupposition, adding the con- 
ditions of the presuppositional DRS to the DRS in which the antecedent is in- 
troduced (and then deleting the presuppositional DRS). For example, the result 
of binding the presupposition in (EEJ to the referent S would be ©. 

, . r [-D I daughter(D, J) ] , 

^ ^ [ [S' I son(S, J) ] ^ [ I bald(S) ] _ ’ 

There is an important difference between presuppositions and pronouns. If 
a suitable antecedent for a presupposition cannot be found, one can simply be 
created in an accessible DRS using the descriptive content of the presupposition. 
The creation of an antecedent for a presupposition that otherwise would not have 
one is the realization in this theory of Lewis’s notion of accommodation 0j. For 
example, if the antecedent of the conditional in were John is bald rather 
John has a son, then, assuming that John cannot be his own son, there would 
be no suitable antecedent for the presupposition in lUEI) . Thus, we would simply 
create one, which, in the case of global accommodation, would result in (EJ. 



( 6 ) 



J X 



[D I daughter(D, J)] ,son(A, J), 
[|bald(J)] ^ [|bald(A)] 



Van der Sandt’s mechanisms of binding and accommodation provide an ac- 
count of the projection problem — global resolution results in projection; local or 
intermediate resolution, in cancellation. There are a variety of preferences and 
constraints that govern the resolution of presuppositions. In short, resolution 
must not result in a DRS that is uninterpretable, inconsistent, or redundant. 
Also, binding is preferred to accommodation, and more local binding and more 
global accommodation are preferred. 



2.2 Quantification in the Standard Account 

In DRT, the conditional is given a strong semantics (every assignment that 
verifies its antecedent must verify its consequent), so that fT^i and HB) may be 
treated equivalently as (EJ-0 

^ In our binding theory examples, we omit the domain set presupposition necessary 
to capture the context-dependence of a quantifier. 
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(7) a. If a farmer owns a donkey, he beats it. 

b. Every farmer who owns a donkey beats it. 

c. [ I [X y I farmer(X),donkey(F),owns(X, F)] ^ [ | beats(X, F) ] ] . 

As a consequence, the restrictor of a quantifier is predicted to be accessible for 
the resolution of pronouns and presuppositions. It seems that as far as binding 
goes, this prediction is correct. For example, the correct interpretation of tH3) is 
the one in which the complex presupposition triggered by his donkey is bound 
to the discourse referents F and D introduced in the restrictor of the quantifier. 



(8) a. Every farmer who owns a donkey beats his donkey. 







\F D j farmer(E'), donkey(D), owns(E, D) ] ^ 




b. 




beats(F, X), 


X 


donkey(X), owns(F, X), [F male(F)] 





Unfortunately, as Beaver j2| points out, the prediction that presuppositions 
may also be accommodated by the restrictor seems to be wrong. Consider sen- 
tences and di). The initial DRS for sentence (U&l is depicted in 



( 9 ) 











\G german(G) ] ^ 








loves(G, X), 


X 


kangaroo(X), owns(F, X), [F male(F)] 





The standard binding account predicts that the presupposition is accommodated 
by the restrictor, resulting in the DRS f irm . This DRS is also predicted to be the 
final interpretation of J23) , after the presupposition triggered by his kangaroo is 
resolved by the restrictor (we omit the intial representation for brevity). 



(10) [I [G K I german(G),kangaroo(A'),owns(G,iir)] ^ [ | loves(G, X) ] ] . 

Of course, sentence (EHJ is clearly not as felicitous as sentence 112 U) . It does not 
seem to be the case that the presupposition triggered by his kangaroo can restrict 
the domain of the quantificational NP every German to just those Germans 
who own kangaroos. The binding theory simply makes the wrong predictions 
regarding intermediate accommodation and quantification. 



2.3 Adverbial Quantification in the Standard Account 

The situation is somewhat different when we turn to adverbial quantification. 
Unlike a quantificational determiner, which is syntactically associated with a 
constituent that provides its restrictor argument (namely, the N’), a qadverb 
has no syntactic relationship with its restrictor argument. Several authors have 
suggested that the presuppositions of an adverbially quantified sentence deter- 
mine the restrictor of a qadverb EE]- Thus, for example, in the discourse Q, 
it is the presupposition of the verb beat — x beats y in some game presupposes 
that X plays y in that game — that provides the description of the domains of 
quantification for the two adverbially quantified sentences. 
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Intermediate accommodation seems to be exactly the mechanism which is 
at work here. Ahn IE] presents an account which relies heavily on intermedi- 
ate accommodation and on the explicit representation of situations. Adverbial 
quantification is treated as quantification over situations. Presuppositions are 
associated with resource situations, following Cooper M- Initially, an adver- 
bially quantified sentence is represented with a vacuous restrictor that provides 
a situation variable with no restrictions, as in (liibi) , a simplified DRS for lilEi n. 



(11) a. John usually beats Marvin at badminton. 



b. 



JM 



[S|SJI ]] 



USU 



S : 



win( J), 

[S" I S" : [ I play_bdtn( J, M) ] ] 



Accommodation of scopal presuppositions in the restrictor results in binding 
their resource situations to the situation variable. This then provides the restric- 
tion, and a presupposition of a set of situations based on this restriction is trig- 
gered. Resolving this presupposition then accounts for the context-dependence 
of the qadverb domain. Thus, in (EHI, the presupposition associated with beat 
is accommodated by the restrictor of usually, resulting in a restrictor that spec- 
ifies situations of badminton-playing, as in m- A presupposition of a set of 
badminton-playing situations would then be generated and resolved by global 
accommodation (there is a set of situations available in the discourse, but since 
the available set is of situations of playing various games, it doesn’t match the 
presupposition) . 



( 12 ) 



JM 



[5 I 5 : [ I play_bdtn( J, M) ] ] [ | 5' : [ | win( J) ] ] 



While this account provides more or less the correct interpretations for ad- 
verbially quantified sentences, it suffers from several faults. First of all, it invokes 
accommodation, which, in the binding theory, is intended as a repair strategy, 
to account for all adverbially quantified sentences in all contexts. Secondly, it 
imposes a strict interleaved order on steps of presupposition resolution and com- 
putation; in particular, it requires that first, presuppositions in the scope be 
computed; second, these presuppositions be resolved; and third, a presupposi- 
tion corresponding to the restrictor be computed. Finally, it is parasitic on the 
standard account of nominal quantification, even though nominal and adverbial 
quantification are syntactically quite different and behave differently precisely 
with respect to intermediate accommodation of scopal presuppositions. 



2.4 The Revised Account 

In response to Beaver’s criticisms, Geurts and van der Sandt ^ present a revised 
account of quantification in the binding theory. The most striking departure from 
the standard account is that DRSs are reified, so that there is a sort of discourse 

^ The colon operator is intended to indicate a support relation between a situation 
and a formula. 



6 



D. Ahn 



referent that is associated with a DRS and that refers to the set of assignments 
which embeds the DRS. By making such a referent anaphoric, it is possible for 
local operations, such as intermediate accommodation, to have non-local effects. 

In particular, the restrictor of a quantifier is treated as such an anaphoric 
discourse referent. Since, on this account, presuppositions are resolved from left 
to right, this restrictor referent is bound to an antecedent before the presup- 
positions of the scope are resolved. These scopal presuppositions may then be 
accommodated by the restrictor DRS, but this results in the direct imposition 
of conditions on the antecedent set. If these conditions are not compatible with 
the antecedent set, resolution fails. Thus, the scopal presuppositions cannot be 
used as the basis for accommodating a new domain set — the identity of the 
domain set is decided before the scopal presuppositions are resolved. 

With this revised account, Geurts and van der Sandt manage to make the 
correct predictions for (^. The restrictor is resolved first by being to bound 
to a set of Germans. Out of the blue, this set of Germans is apt to be the set 
of all Germans. Accommodation of the scopal presupposition requires imposing 
an additional condition on this domain set, namely, that every German has a 
kangaroo. Since this additional condition is clearly in conflict with what we know 
about Germans, the resolution fails, resulting in infelicity. There is no option to 
accommodate a set which satisfies the scopal presuppositions. 

Geurts and van der Sandt also present an account of adverbial quantification 
that is based on their account of nominal quantification. Like Ahn, they begin 
with a vacuous restrictor, and as with nominal quantifiers, this (vacuous) restric- 
tor is resolved before the scopal presuppositions. Unfortunately, as we have seen 
in the example ( 0 , the domain set must be accommodated on the basis of the 
scopal presuppositions, which is exactly the option that this revised mechanism 
makes impossible. This account of adverbial quantification, too, is plagued by 
an undue reliance on an analysis of nominal quantification. 

3 Satisfaction 

Van der Sandt ’s binding theory of presupposition, in which presuppositions are 
treated as entities that can be resolved (and thus, in a sense, cancelled) within a 
highly structured context, is only one of many ways to approach the problem of 
presuppositions. In the remainder of this paper, we adopt an alternative account 
of presuppositional phenomena — the satisfaction theory of Karttunen [3] . 

Karttunen’s theory takes seriously the pretheoretic intuition that a presup- 
position is something that is taken for granted. The presuppositions of a sentence 
are taken to be constraints on the contexts in which the sentence can be uttered. 
Thus, a context admits a sentence if and only if it satisfies the presuppositions 
of the sentence. Grucially, the presuppositions of the components of a complex 
sentence do not necessarily constrain the context of the complex sentence itself. 
Instead, one part of a complex sentence may establish an updated context of 
evaluation for another part. For example, the first conjunct of a conjunction is 
evaluated with respect to the same context as the entire conjunction, but the 



Presupposition Incorporation in Adverbial Quantification 



7 



second conjunct is evaluated with respect to the update of that context with the 
first conjunct. Thus, if the first conjunct entails the presuppositions of the sec- 
ond, those presuppositions do not project — the updated context with respect 
to which the second conjunct is evaluated will always entail them. 

We build on Beaver’s formalization of the satisfaction theory in terms of 
an update semantics 0, which, in turn, builds on Heim’s dynamic account in 
terms of context-change potential HSl- By treating the meaning of a sentence 
dynamically, as a function from input contexts to output contexts, it is pos- 
sible to tie the admittance conditions of a sentence directly to its semantics. 
Following Stalnaker PI, Beaver treats a context as a set of possible worlds. A 
context consists of the possible worlds compatible with what has already been 
expressed in a conversation and thus represents the “live options” left open by 
the conversation. We show that by treating possible worlds as structures that 
are decomposable into constituent situations, adverbial quantification can be 
given an analysis that accounts for both domain determination by presupposi- 
tion incorporation and the anaphoricity of qadverb domains without recourse 
to intermediate accommodation. This analysis also avoids the inaccurate anal- 
ogy with nominal quantification and is thus free to provide a one-place logical 
operator, which corresponds more closely to the syntactic type of a qadverb. 



3.1 Context-Change Potential and Situation Theory 

The theory of interpretation we adopt here identifies the meaning of a sentence 
not with its truth conditions but with its context-change potential. The deno- 
tation of a sentence is defined as a relation between an input context and an 
output context. As we stated earlier, we take a context to be a set of possible 
worlds, and we use an update semantics, along the lines of Veltman PI , to model 
context change. The set of possible worlds that constitute a context is intended 
to represent the possibilities left open by the discourse so far. To update a con- 
text with a sentence, the worlds which are not compatible with the sentence are 
thrown out. Thus, as a discourse progresses, possibilities are winnowed away and 
the context comes closer and closer to the speaker’s view of the actual world. 

We augment this notion of context by taking possible worlds to have situa- 
tion al sub structure. We take a view of situations along the lines of Schubert et 
al. [ Il6,il7] l. In the ontology of situations given in situations are individual 
entities in the domain of discourse. The set of situations S within the domain of 
discourse is subject to a partial part-of ordering (notated C) which induces a join 
semi-lattice structure (the join operation is notated with the infix operator U). 
Thus, any two situations may be joined to form a larger situation. We identify 
the maximal element of the semi-lattice as the world. 

In order to introduce multiple possible worlds into an ontology such as this 
one, we weaken the structure of the set of situations. Instead of requiring that it 
be a join semi-lattice (in other words, that every pair of situations have a unique 
join), we merely require that it be a set of possibly overlapping join semi-lattices. 
In order to enforce this requirement, we designate a subset of the set of situations 
S as the set of worlds W and require that every situation stand in the part-of 
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relation with at least one world. We further require that if two situations t, u 
are both part of the same world w, then there exists a situation s C re such that 
t C s, w C s, and for every situation s' C re, if t C s' and rt C s', then s C s'. 
We call s the join of t and u with respect to re and write s = t u. 

The most distinctive characteristic of Schubert’s situation theory is that there 
are two different relations in which a situation and a formula may find them- 
selves. The support relation holds between a formula and a situation which is 
(at least) partially described by the formula; this more familiar relation is simi- 
lar to the support relation in the Situation Semantics of Barwise and Perry M- 
The characterization relation, which is a generalization of the relation between a 
Davidsonian event predication and the event described by the predication, holds 
between a situation and a formula which provides a description that applies to 
the entire situation as a whole. Note that this description need not be fully spec- 
ified or detailed, only that it must describe the entire situation and not merely 
a part of it. Interestingly, the characterization relation between a situation and 
a negated formula does not hold simply whenever the characterization relation 
fails to hold between the situation and the non-negated formula. Instead, there 
is a notion of anti-characterization — a formula ^(j) characterizes a situation s 
if and only if s is a situation of (j) not holding.0Both the support and character- 
ization relations are defined in terms of the basic Davidsonian relation between 
events and atomic event predicates and a corresponding anti-relation. 

Following Ahn and Schubert [EIJ, we adopt a reconstruction of the propo- 
sitional fragment of Schubert’s first-order situation logic FOL** as a modal 
propositional logic, in which the states are situations and there are modalities 
correspondi^ to the subsumption (the dual of part-of) and join relations. The 
satisfactiontl relation of our logic corresponds to the characterization relation. 
Support can be defined as possibility with respect to the subsumption modality. 
Our update semantics, then, is exactly that of Beaver’s, except that the atomic 
update condition applies to all modal formulas and is stated with respect to the 
modal satisfaction relation. 



3.2 A Propositional Update Logic with Adverbial Quantification 

A model for our logic At is a a 5-tuple ((5, C), W, □, Li, (X+,X^)). (S', C) is 
the partially ordered set of situations; W is the subset of S that is the set of 
worlds. The binary relation □ (subsumption) is the dual of the partial order 
C. The ternary relation U is the “is-the-join-of-with-respect-to-some-world” re- 
lation; thus, U(s,f, u) iff s is the join of t and u with respect to some world. 

® What constitutes a situation of some formula not holding is an interesting question. 
One possibility is a situation ofsome positive event tha^recludes the truth of the 
formula. See Schubert et al. [ t6Pl7) , as well as Cooper |Ld|, for further discussion. 
We use the term satisfaction for two different relations. One is Karttunen’s relation 
between a context and a sentence; the other is the relation between a state in a model 
and a formula of our modal logic. Where there may be confusion, we use context or 
modal satisfaction. 
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Finally, and are the positive and negative interpretation functions — 
they correspond directly to atomic characterization and anti-characterization. 

First, we give the modal satisfaction conditions that will form the atomic 
basis for our update semantics. 

Definition 1 (Modal satisfaction conditions). 

For a model A4 and a situation s: 

M,s\\-p iffsel^(p) , 

M,s\\-<>(j) iff3s'.{s^s')and{M.,s'\\-(j)) , 

M,s\\-U(f) tjff Vs'.(s □ s') only if \\- (j)) , 

M,s\\- (j) of) iff3t, u.{Ai, t Ih (j>) and (At, u If- f>) and U (s, t, u) , 
At,slh-(/) ijff At,slh- (/> , 

At, s Ih (/) V V’ *if At, s Ih (/) or At, s If- V' • 

The conditions for anti-satisfaction (\^~) are the dual of these conditions 
(i.e. Ih and Ih^ are interchanged, as are o and V, and universal and existential 
quantification, and so on). 

There are several crucial things to observe here. The first is that the satisfac- 
tion relation between a situation and a formula corresponds to characterization. 
Another is that the binary modal operator o corresponds to conjunction under 
characterization for FOL**. Since each characterization statement of FOL** is 
intended to correspond to a single tensed clause, o is intended to be used to 
“conjoin” the atomic predications that correspond to the translation of a single 
tensed clause. Thus, we treat o as the dual of V. Finally, we can think of the 
□ modality as a sort of accessibility relation betweeen situations, which can be 
used to define Schubert’s notions of support and inward persistence. 

Definition 2 (Snpport and inward persistence). 

A situation s supports a formula f iff s\\- Ocf. 

A situation s is inward persistent with respect to a formula (f iff s\\- 0(p. 

A context is a set of worlds. The denotation of a formula of our logic is a 
relation between contexts, given by the following update semantics. 

Definition 3 (Update conditions). 

For contexts a and t, the denotation |-| of a formula is given recursively, as 
follows: 



ir iff T = {w e a\w Ih Ocfmodai} , 
iff3iy.(Tl<j)lh'andT—a\i' , 
alffAxplr iff Bn.alfln and vltpjx , 
al(j} f)jr iff A {^7p))jT , 

crl(l)\/ tplr iff A ^ip)jT , 

iff a \= (j) and t = a . 



In the first condition, (^imodai indicates any atomic formula or any formula 
whose top-level operator is one of O, □, or o. The condition states that the 
result of updating an input context with an atomic proposition is the subset 
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of the input context containing just those possible worlds which support the 
proposition. 

The remaining conditions are just as Beaver defines them. The second con- 
dition states that an input context is updated by a negation just in case it can 
be updated by the non-negated formula; the output context is the set of those 
worlds in the input context which are not present in update by the non-negated 
formula. The third condition states that an input context is updated by a con- 
junction just in case it can be updated by the first conjunct and the result of 
that update can be updated by the second conjunct; the output context is the 
result of this second update. The fourth and fifth clauses define disjunction and 
implication in terms of negation and conjunction. The sixth clause defines the 
semantics of the unary presupposition operator (d) in terms of the notion of 
context satisfaction, which is defined next. 

Definition 4 (Context satisfaction). 

o- h= 0 iif a|(/)la. 

A context satisfies a formula just in case updating the context with the 
formula results in no new information being added to the context. Thus, an input 
context is updated by an elementary presupposition indicated by the d operator 
just in case the context already satisfies the presupposition. If the input context 
does not satisfy the presupposition, the update fails. 

Beaver formalizes Karttunen’s notion of admittance as follows. 

Definition 5 (Admittance). 

a>ip Zjff Br.aMr. 

A context admits a formula if and only if it is possible to update the context 
with the formula. Beaver defines the relation of presupposition between complex 
formulas in terms of admittance and satisfaction. 

Definition 6 (Presupposition). 

Ip iffWa.a t> (p ^ a \= xp. 

Alternatively, (p^ xp ifp3x-l(pl — {dxp A xl • 

There are two equivalent characterizations of presupposition given in this 
definition. The first characterization states that one formula presupposes another 
just in case every context that admits the first satisfies the second. The second 
characterization states that one formula presupposes another just in case the 
first is equivalent to the conjunction of the presupposition of the second and 
some other residual formula. 

We now define a new notion which figures in our semantics for adverbial 
quantification. We would like to be able to talk about the presupposition of a 
formula, a notion which we formalize as maximal presupposition, as well as the 
asserted content, which is the non-presupposing residual. 

Definition 7 (Maximal presupposition). 

(p l^max Ip ijJ'^CF.a t>(p ^ a xp . 

Alternatively, <p i>max i’ W^X-14'l — ^ xl Vu.a > %. We eall x the 

asserted content of (p. 
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The relationship between the definitions of presupposition and maximal pre- 
supposition should be clear. In order for a formula ip to be the maximal pre- 
supposition of another formula (p, ip must be a presupposition of p, and further, 
it must be sufficient that a context satisfy tp for it to admit 4>. The alternative 
characterization is clearer and allows us to define our notion of asserted content: 
the residual formula x must be non-presupposing. 

We introduce one further auxiliary notion, which allows us to extract from a 
set of situations the subset of situations which are part of a particular world. 

Definition 8 (Sitnation slicing). 

For a set of situations S and a possible world w, = {s G S\s C w}. 

We now add an operator to our language that corresponds to adverbial quan- 
tification. Unlike the qadverb-like operators introduced in the accounts discussed 
above — two-place operators, analogous to quantificational determiners, but 
with an unspecified restrictor argument — our operator takes only a single sen- 
tential argument. This brings it closer to the syntactic type of a qadverb. 

Definition 9 (Adverbial quantification). 

alQ{(p)jT, where Q is a quantifier whose denotation is Q, iff 3tp , x, R, S : 

1. a > (j) , 

2. (p i^rnax ^ : 

3. R= {s|3w G (j.siFw and s\\- ip} , 

4- X is asserted content of p, 

5. S — {s G i?|3s'.s C s' and s' \G ipo x} , 

6. r = {w G CT|Q(i?u,, -5^,)} . 

The first three conditions together have the effect creating a domain of quan- 
tification — the set R — out of the presuppositions of the scope p and requiring 
that the input context satisfy the existence of this domain. Because the in- 
put context satisfies ip, the maximal presupposition of the scope, each possible 
world in the input context must support ip, which in turn means that each 
possible world must have sub-situations which are characterized by ip. These 
sub-situations form the domain R. To form the nuclear scope set, we must find 
those situations that stand in the modal satisfaction relation with p, which we 
reinterpret as the modal formula p o x- The nuclear scope set S is then set of 
those members of R which can be extended into a situation which satisfies pox- 
The final condition outputs those worlds in the input for which the properly 
restricted domain set and scope set stand in the quantifier relation. 

Returning to the example discourse (jsll , if we take p to be the specific propo- 
sition that John plays (i.e. is playing) Marvin at badminton and q to be the 
specific propo sition that John wins, we might represent the adverbially quanti- 
fied sentence as the formula USU(5p A q). By the first condition of the 

definition of adverbial quantification, this formula is admitted by a context a 
just in case dp A q is admitted by a. It should be clear that a admits dp Aq just 
in case every world in cr supports the proposition p. A world w, in turn, supports 
p just in case one (or more) of its constituent situations are situations of John 
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playing Marvin at badminton. Thus, a admits USU(£?p A q) just in case every 
world in a has constituent situations of John playing Marvin at badminton. 

These situations of John playing Marvin at badminton constitute the domain 
of quantification for the qadverb. Thus, our formula both incorporates scopal 
presuppositions into its restriction and carries the presupposition that its domain 
exists. Updating a with this formula results in a context containing just those 
worlds in which most situations of John playing Marvin at badminton can be 
extended to situations of John playing Marvin at badminton and winning. 



3.3 Accommodation as Context Selection 

Although we have emphasized the absence of intermediate accommodation as 
a feature of our analysis, we still need a general account of accommodation. 
Even in our example discourse (P, some notion of accommodation is needed 
to explain the felicity of the adverbially quantified sentences. The sentence we 
have been focusing on, presupposes that there are one or more situations 

of John playing Marvin at badminton. Unfortunately, the preceding discourse 
makes no claim regarding whether or not they have ever played badminton, and 
thus, the context could presumably include worlds in which they never have. 
Such a context would fail to admit our sentence. 

Heim provides a mechanism of accommodation much like van der Sandt’s 
— simply add presuppositions that are not already present in the context. This 
would solve the problem of admittance, but it would fail to account for the 
anaphoric link between the presupposed situations of John and Marvin’s bad- 
minton playing and the asserted situations of their game playing. 

Beaver (ij suggests an alternative view of accommodation. Instead of identi- 
fying a discourse participant’s information state with a context, he takes it to be 
a set of contexts, ordered according to plausibility. Updating such an informa- 
tion state with an utterance consists of updating each of the member contexts, 
throwing out contexts that cannot be updated. Contexts, then, are a way to 
encode commonsense knowledge. 

For example, a commonsensical hearer should be aware that two people who 
played various games might include badminton among those games. Such a 
hearer begins with an information state which includes one or more contexts 
which satisfy a rule along the lines of if two people play games together, some 
of those games may be badminton. After updating with the first sentence of our 
discourse, every world in those contexts includes a set of situations of John and 
Marvin playing games, some of which are games of badminton. Of course, other 
(possibly more plausible) context s wou ld have rules involving other games, but 
when the hearer gets to sentence only those contexts with the badminton 

rule are admissible. The domain of the qadverb usually, then, is the set of those 
situations of John and Marvin playing badminton, which is a subset of those 
situations of John and Marvin playing games. This is an admittedly informal 
characterization, but we hope that it is at least suggestive of an approach to 
integrating our analysis into a general account of bridging and partial matches. 
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4 Conclusion 

Our analysis of adverbial quantification has several advantages over an account 
framed in terms of van der Sandt’s binding theory. Principally, no recourse to ac- 
commodation is required to explain the incorporation of scopal presuppositions 
into a qadverb’s domain of quantification. On our account, a qadverb domain 
is determined directly through the satisfaction of scopal presuppositions — it is 
composed of the sub-situations of the satisfying worlds that are characterized 
by the presuppositions. Furthermore, no separate presupposition needs to be 
computed to account for the context-dependence of the qadverb domain. De- 
termining the domain through contextual satisfaction of scopal presuppositions 
ensures that the context contains the domain set. Finally, the qadverb-like op- 
erator we introduce is more similar in its syntactic type to natural language 
qadverbs than the two-place operator normally proposed. 

There are some obvious deficiencies with this account. One is that we have 
not formally related it to an account of accommodation, although we have in- 
formally outlined one possible avenue of exploration. Another important caveat 
is that we cannot seriously propose a formula like dp A q as a compositional 
semantic interpretation of the sentence John beat Marvin at badminton. Rather, 
we must develop a realistic compositional semantics that would assign to such 
a sentence an interpretation which both presupposed something like p and as- 
serted something like q. Furthermore, we must hope that such an interpretation 
would render unnecessary our admittedly awkward reinterpretation of the scope 
of a qadverb as a modal formula. 

Beaver, in other work pi] , has demonstrated that it is possible to give a com- 
positional treatment of predication and quantification over ordinary individuals 
that yields interpretations that are equivalent, at the propositional level, to for- 
mulas like the ones we have been working with in this paper. Of course, it is not 
entirely clear how to extend the static semantic notion that relates situations to 
formulas — satisfaction/characterization — to work with the dynamic variable 
binding required for an account of quantification. Aim’s dynamic extension of 
Schubert first-order models |Ei| may provide a starting point, but even as it 
is, the static conditions for characterization are uncomfortably unrelated to the 
update semantics for the rest of the language. 

In at least one respect, there is a bright side to this last deficiency. We 
have provided an account of adverbial quantification that is not parasitic on 
an account of nominal quantification. Given that the two phenomena diverge 
significantly with respect to presupposition incorporation, it is a small victory 
to be able to give them naturally heterogeneous accounts. 
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Abstract. A theory of contextual propositions for indicative condition- 
als is presented. The main challenge is to give a precise account of how 
the dynamics of possible worlds depends on epistemic context. Robert 
Stalnaker suggested in [Sta.Ina.ker that even when selection functions 
for evaluating indicatives cannot be defined in terms of epistemic context, 
they can be importantly constrained by a principle of context dependency 
that we adopt here. In addition, we show how to define a gradation of 
possibilities for each point in an epistemic context by taking into account 
a proposal first introduced by Wolfgang Spohn in [Mpohn and later re- 
hned by Darwiche and Pearl in | fl )a,rwiche fc Pea~ m . The resulting the- 
ory of contextual propositions (unlike some alternative views) is shown 
to be compatible with basic qualitative consequences of the Bayesian 
principle of conditionalization (which is frequently used in probabilistic 
semantics for indicative conditionals). 



1 Introduction 

There is an area of pragmatics and semantics where the problems of context- 
dependency are particularly poignant. The area in question is concerned with the 
semantics of indicative conditionals. While many assume that other conditionals 
(subjunctives, for example) express propositions and carry probability, there is 
considerably less consensus about that regarding indicatives (see, for example, 
[ Hibbard S(1J ). In this paper I propose to revisit the problem of the semantic of 
indicatives and to use it as a tool in order to think about context-dependency 
in semantics. I shall argue that a theory of contextual propositions for indica- 
tives is possible, but that in order to build it up it is necessary to radicalize 
assumptions about context dependency common in semantics and pragmatics. 
In particular I shall argue that the right theory requires making the dynamic 
properties of possible words dependent on epistemic context. The picture that 
thus arises confirms the ideas that many have expressed before concerning the 
hidden indexicality in conditional statements |va.n Ura.a.ssen 7fi] Indicatives do 
express contextual propositions, which are highly sensitive to epistemic context. 
But a theory of such propositions, which is also compatible with basic epistemo- 
logical and semantic tenets, requires to think anew the role of possible worlds in 
semantic constructions and in formal epistemology. 
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Frank Ramsey provided basic intuitions about the semantics of conditionals 
in a footnote of a paper on laws and causality [Ramsey 9G|| . Here is the passage 
that the followers of at least three orthogonal research programs in semantics of 
conditionals see at the root of their proposals (for an overview of both epistemic 
and ontic theories see [tlross fc Nnte 98jl: 

...the belief on which the man acts is that if he eats the cake he will be 
ill, taken according to our above account as a material implication. We 
cannot contradict this proposition either before or after the event, for 
it is true provided the man doesn’t eat the cake, and before the event 
we have no reason to think he will eat it, and after the event we know 
he hasn’t. Since he thinks nothing false, why do we dispute with him or 
condemn him?0 Before the event we do differ from him in a quite clear 
way : it is not that he believes p, we p ; but he has a different degree of 
belief in q given p from ours ; and we can obviously try to convert him 
to our view. But after the event we both know that he did not eat the 
cake and that he was not ill ; the difference between us is that he thinks 
that if he had eaten it he would have been ill, whereas we think he would 
not. But this is prima facie not a difference of degrees of belief in any 
proposition, for we both agree as to all the facts. 

The footnote (1) in the above text provides further clarification: 

If two people are arguing ‘If p, then q?’ and are both in doubt as to 
p, they are adding p hypothetically to their stock of knowledge and 
arguing on that basis about q ; so that in a sense ‘If p, q’ and ‘If p, q’ are 
contradictories. We can say that they are fixing their degree of belief in 
q given p. If p turns out false, these degrees of belief are rendered void. 

If either party believes not p for certain, the question ceases to mean 
anything to him emept as a question about what follows from certain 
laws or hypothesis .EJ 

In order to illustrate Ramsey’s example we can use a small universe of four 
points (or possible worlds) {tCi, W 2 , Ws, ^4}. Say that in rui and W 2 it is true 
that the cake is good; and worlds wi and W 3 are situations where the man does 
eat the cake. In addition, world W 3 , where the man eats the cake and the cake 
is bad is also a situation where the man is ill. 

An agent can have a prior probability distribution according to which: P(rfi) 
= .04, P(ro2) = -01, P(u;3) = .16, P(tf4) = .79. The idea being that the agent 
puts a high probability (.8) on the event that the man will not eat the cake and 
most of this probability mass concentrates on the event that the man will not eat 
it when it is bad, etc. A second agent assessing the situation might diverge from 
him regarding his degrees of belief. For example, he might swap the probabilities 
attributed to Wi and W3. I.e. this second agent might have a probability function 
P’ which coincides with P, except that P’(rtii) = .16 and P’(rr;3) = .04. The 



^ See [ Ramsey 901 , pages 154-55. 
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conditional probabilities of these two agents differ in an important manner. In 
particular the probability that the man is ill conditional on his eating the cake 
is .8 for the first agent and .2 for the second agent. So, if the probability of 
a conditional is conditional probability£| these two agents assign very different 
probabilities to: ‘if the man eats the cake, he will be ill’. 

Of course, as Ramsey points out, before the fact neither agent can contradict 
the proposition that either the man does not eat the cake or he will be ill. 
In fact, before the fact either agent does not have a formed belief concerning 
whether the man eats or does not eat the cake. The corresponding material 
conditional cannot be contradicted after the fact either. But in addition after 
the fact the degrees of belief of the two agents converge. After the fact (i.e. after 
the man decides not to eat the cake) both agents assessing the situation assign 
zero probability to Wi and to w^. And they converge in assigning P(tC 2 ) = .0125 
and P(tC 4 ) = .9875. Both agents are sure that the man did not eat the cake (this 
proposition receiving measure one). Of course, I am assuming here that both 
agents condition on the fact that the man did not eat the cake. 

Ramsey points out in addition that even after this convergence in degrees 
of beliefs, it seems that the two agents in our story can still be differentiated. 
The difference being that the first agent will still accept that if the man had 
eaten the cake he would have been ill, while the second agent will reject this 
conditional. This new conditional is often classified as a subjunctive (as opposed 
to the indicatives considered above). 

Notice that before the fact the crucial issue was the evaluation of conditional 
beliefs. In a nutshell the important thing is evaluating P(zc 3 | tci U w^). For 
one agent this conditional probability is high (.8) for the other agent is low (.2). 
Why not to use the same strategy after the fact? The reason is that after the 
fact this conditional probability is not defined because P(wi U w^) — 0. So, 
degrees of belief conditional on rci U are, as Ramsey says, void. In other 
words, in order to evaluate ‘if the man had eaten the cake he would have been 
ill’ one needs to make a supposition (that the man had eaten the cake) which 
contradicts an event of measure one. And this cannot be done with the apparatus 
of standard probability theory (for recent work on using non-standard measures 
in the semantics of conditionals see IMcCee 94] and |Ar1o-Costa DIIJ ). 

The analysis of indicatives is done with the help of two elements: a context 
set [Rta.1na.kpr S4] [Rtalna.kpr QS| consisting of a Space of possible worlds, and 
a probability distribution defined over them. The evaluation of conditionals is 
then sensitive to both qualitative and numerical differences in the representation. 
Before the fact differences in the attitudes towards conditionals are ultimately 
traced back to differences in degrees of beliefs, that are registered in terms of 



^ The status of this thesis (probability of conditionals are conditional probabilities), 
is quite problematic. There is a consensus to the extent that it can be saved for 
non-nested conditionals. In addition van Fraassen has offered an indexical interpre- 
tation of conditional propositions for which stronger variants of the thesis also hold 
[ ka.n Kra.a.ssen 76] . More robust versions of the thesis for non-standard probability 
are offered in [IMcnee P4J and 
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the probability function defined over the context set. Recent work extending 
Ramsey ideas has proposed to adopt a generalized notion of conditional prob- 
ability (adopted as an epistemological primitive) which allows for conditioning 
on events of measure zero. Alternatively the idea is to supplement the standard 
representational tools of Bayesian theory with a metric that can make compar- 
isons between events of measure zero. In our example one can put the problem 
as follows: even when both Wi and W3 carry zero measure (after the fact), which 
is the state that one judges more plausible from the point of view of the context 
set composed by W2 U W4? This is equivalent to supplement the context set and 
the probability function defined over it, by a belief revision function. In a com- 
panion paper I suggested that t his should be done in ord er to have a reasonably 
powerful representational tool lArlo-Costa forthcominell . I shall offer here new 
reasons for extending context sets with belief revision functions (which are based 
on basic facts related to the semantics and pragmatics of indicatives). 

One important aspect of the representational framework tacitly introduced 
above resides in the relationships between the underlying context set and the 
probability function defined over it. One can see the context set as associated 
with the probability function via a function p. So, in the above example, /o(P) 
= {tci, v)2, W3, W4}. ^he idea is to define p(P) as the set of points that receive 
non-zero probability.EI Now, in our example for any proposition A, such that 
p(P) n A 0 , we have that the context set corresponding to the probability 
function P updated with A {Pa) is calculated as p(P) O A. We will use the 
notation p{P)a to denote the process of updating the context set corresponding 
to P, with the proposition A. So, as long as every state in the prior context set 
receives non-zero probability, we have that: 

(Preservation) If p{P) D A 0 , then p{P)a = p{Pa) = p(P) H A 0 



More in general for any epistemic state E (which can be different from a 
standard probability function), such that it is associated with a context set via 
the function p, we have: 

(Preservation-E) If p(E) n A y^ 0, then p{E)a = p{Ea) = p(E) n aP 

® This idea is coherent with a probabilistic outlook. Regarding the epistemological 
interpretation of p(P), in [IArlo-( Asta, 01) I offer an argument to the extent that 
it encodes the expectations of the agent described by P, rather than his certainties. 
The set p(P) can also be assumed as a primitive without probabilistic interpretation. 
This yields a more sophisticated account of the relationships between probability and 
belief. Our arguments here can be stated in either case, but below we will follow the 
idea that p(P) encodes the set of points carrying positive probability. 

See jOa.rdenfors 88) for the use of this terminology and for a basic introduction to 
the theory of belief revision. Here Pa{X\Y) = P{X\Y n A). 

® E here can be any representation of the epistemic state. For example, later on we 
will propose to use rankings or ordinal conditional functions as representations of 
epistemic states. Those rankings can be substituted by the variable E in the previous 
formualtion. 
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and when the epistemic state is identified with a belief set K, i.e. when E = 
K = p(E): 

(Preservation-K) If K n A 0, then Ka = K n A 



This is a very simple principle, ultimately justified in terms of the properties 
of conditionalization. Of course, in our example, we start with a probability 
function P and p(P) = {tCi, W2, Ws, ^4}. Then we condition with A {w2, W4}. 
As a consequence Pa = P’, and /c(P’) = {tU2, Wi}- In the following analysis we 
will assume Preservation. 

2 Context Set and Indicatives: Stalnaker’s Constraint 
and Preservation 

Say that one of the agents in our example learns that the cake is in bad shape. 
Then his context set shrinks to {tfs, W4}. From this point of view he accepts 
‘if the man eats the cake he will be ill’. As a matter of fact the corresponding 
conditional probability is one. Does this conditional express a proposition? If so, 
how to build it up with the elements we have? 

Propositions are built up by worlds, so a natural manner of proceeding is 
to say that the conditional proposition will be built (if it exists at all) by a 
set of possible worlds obeying some natural constraint. An obvious constraint 
is to collect all the possible worlds where the conditional in question is true. 
Here is the analysis proposed by Robert Stalnaker in his article on indicative 
conditionals: 

We need a function which takes a proposition (the antecedent of the 
conditional) and a possible world (the world as it is) into a possible 
world (the world as it would be if the antecedent were true). Intuitively 
the value of the function should be that world in which the antecedent is 
true which is most similar, in relevant aspects, to the actual world (the 
world which is one of the arguments of the function). In terms of such 
function - call it /- the semantic rule for the conditional may be stated 
as follows: a conditional if A, then B, is true in a possible world i just in 
case B is true in possible world f(A, i). 

So, now our representational framework has been amplified. Now, in addition 
to our context set and, eventually, a probability function P on it, we need a 
primitive selection function / defined for each world and argument A. Notice 
that this new primitive used in the construction is not definable from the initial 
context set and the probability function P. It is a new primitive, and if we follow 
Stalnaker’s suggestions, the motivation for it is purely ontological. The similarity 
of worlds has not been used so far in our analysis. For the moment and , for the 
sake of the argument, I shall expand the basic context set with it. Later on we 
will have the opportunity of discussing its use. 
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Are there any useful constraints we can impose on /? Notice that so far 
the function is completely unconstrained. One useful constraint is presented by 
Stalnaker in his piece in indicative conditionals. 

I cannot define the selection function in terms of the context set, but the 
following constraint imposed on the context set on the selection func- 
tion seems plausible: if the conditional is being evaluated at a world in 
the context set, then the world selected must, if possible, be within the 
context set as well (where C is the context set, if i G C, then f(A, i) G 
C). In other worlds, all worlds within the context set are closer to each 
other than any worlds outside it. The idea is that when a speaker says 
‘If A’, then everything he is presupposing to hold in the actual situation 
is presupposed to hold in the hypothetical situation in which A is true. 

This constraint seems sensible and useful. Even when the selection functions / 
are a new primitive, at least their behavior is constrained by the existing context, 
in a way that seems intuitive. So, since our goal here is to try to understand what 
type of propositions are expressed by indicatives (if any), it seems natural to 
add this constraint to the principle of Preservation (formulated in the previous 
section). Unfortunately this is not possible. I shall use a slightly modified form 
of our example in order to illustrate the problem and in order to begin to present 
a solution. 

We can start with a context set C = {wi, W2, w^, W4}. Say that in addition 
we have a probability distribution on this context such that P(tUi) = . 16 , P(tU2) 
= . 01 , P(tU3) = . 04 , P(tU4) = . 79 . Say that in w\ and W2 it is true that Gore 
is elected; and worlds w\ and wz are situations where Gore wins the popular 
vote. This could be the context set of an agent prior to the last general election 
in USA. The agent in question thinks that it is highly likely that Gore will not 
be elected and that he will not win the popular vote. Tiny probabilities are 
assigned to the cases where Gore wins the popular vote and is not elected; and 
to the situation where is elected even when he does not won the popular vote. 
We know today that this prior will be modified drastically after the election, but 
as a prior it is a perfectly possible one. States tui and can be considered as 
‘good’ states, while their complement can be seen as bad outcomes. In the good 
states Gore is elected when he wins the popular vote, and Gore is not elected 
when he loses the popular vote. This partition of states into good and bad ones 
is the qualitative precursor of having a value function for outcomes. 

Let’s now consider two possible states of belief definable over the given con- 
text set G. One of them is K = {ws, ^4}, which corresponds to the proposition 
that Gore has not been elected. Another is G = {tci, W4}, which corresponds, 
to the proposition that the outcome of the election is good. Of course, when the 
probability measure P is updated either with K or G the context set G shrinks 
either to K or to G. Shifting to G is tantamount to deny any credibility what- 
soever to the marginal outcomes when Gore wins the popular vote but he is not 
elected and/or when Gore is elected and he loses the popular vote. There might 
be situations where shifting to such state might be justified - in the presence 
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of appropriate legal or institutional arrangements (or statistical and pool data 
before the fact). 

We need in addition information about the /function. Here is a possible distri- 
bution of /-values. Let A = {tui and w^}. Then define: f(A, tci) = {tCi}, f(A, W2) 
= {tfi}, f(A, W3) = {tea}, f(A, 104) = {ws}. According to Stalnaker ‘intuitively 
the value of the function should be that world in which the antecedent is true 
which is most similar, in relevant aspects, to the actual world (the world which 
is one of the arguments of the function).’ The function proposed above imple- 
ments the most elemental notion of closeness in terms of mereological similarity 
(by counting shared atoms under a constraint). There might be, of course, other 
relevant notions of similarity. The important issue is that most of the existing 
notions of ontological similarity are context-independent. They are not supposed 
to be sensible to changes in epistemic context, being determined absolutely in 
terms of certain objective features. 

Consider then the conditional ‘if Gore wins the popular vote this will be 
a good outcome’ where the notion of goodness is determined as we explained 
above via the proposition G. We can abbreviate this conditional by A > G. 
Which is the proposition expressed by such conditional? In order to determine 
this it would be good to add a further degree of precision. We are here evaluating 
propositions relative to a model M, which consists of a prior context set C, a 
probability function P, and a function / for each possible world in the context 
set C. 

[A > G]^ = {w e C: f(A, w) C G} = {tCi, 102}- 



It seems intuitive that after shifting (from C) to the context set G (‘the 
outcome is good’ - where ‘bad’ outcomes have zero probability of occurring) one 
would be willing to assert ‘if Gore wins the popular vote this is a good outcome’. 
The reason for this is that Ga = {w^i}) which is a good state of affaires (Gore is 
elected and he wins the popular vote). On the other hand, after shifting (from 
C) to K (Gore is not elected president) one would be willing to accept ‘if Gore 
wins the popular vote this will not be a good outcome’. This is so, given that 
Ka = {res}, which is not a ‘good’ outcome (Gore wins the popular vote but is 
not elected president). 

Notice that after conditioning with the proposition G (‘the outcome is good’) 
the acceptance of ‘if Gore wins the popular vote this is a good outcome’ is 
mandated probabilistically. In fact, after conditioning with G, all probability is 
distributed among the worlds Wi (. 168 ) and W4 (. 83 ). If P' is P(.|G) it is clear 
that P'{G\A) is one. So, ‘if Gore wins the popular vote this is a good outcome’ 
should be accepted with respect to P' . 

Notice, nevertheless, that [A > G]^ = {rci, W2}- This means that [A > 
G]^ is not entailed by G, contrary to intuition and the probabilistic account of 
acceptance. Notice also that in this situation Stalnaker’s constraint on selection 
functions is violated. In fact, after shifting to G the constraint requires that ‘...if 
the conditional is being evaluated at a world in the context set, then the world 
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selected must, if possible, be within the context set as well’ - the emphasis is mine. 
This seems to indicate that when f(A, 1V4) is evaluated in the context set G, the 
constraint requires to set f(A, W4) to {tui}, a world which is not mereologically 
similar to {tU4}. If one is guided by an objective criterion for determining the 
similarity of worlds (like the one we are using, mereological similarity) then 
the content of f(A, W4) is fixed in a context-independent manner. For example, 
according to our simple implementation of mereological distance, f(A, W4) = 

Perhaps Stalnaker’s criterion can be seen as an extra consideration to be 
taken into account when the initial criterion for determining similarity is not 
powerful enough to specify the content of all functions. Or it can be read as a 
complementary consideration, which can clash with an underlying criteria for 
establishing similarity, in such a way that one has to decide case by case. Fi- 
nally, it can be seen as a proposal for creating selection functions which depend 
on tl^ee arguments, the actual world, a given proposition and the given con- 
text .tl Perhaps the latter option is the most charitable manner of interpreting 
Stalnaker. It should be noted, nevertheless, that the aforementioned examples 
show that this is tantamount to not having a unified criterion for determining 
the content of selection functions. Selection functions will pick up the mereologi- 
cally closest worlds, when this is permitted by the constraint for indicatives, and 
other convenient worlds in the context of evaluation otherwise. This is not a very 
satisfying situation. In fact, it seems that the result of applying the constraint 
is to abandon a principled way of understanding world selection in general. 

Let me re-estate some of the issues just mentioned in a more formal man- 
ner. When we consider mereological similarity as the determining factor in con- 
structing selection functions one has: [A > G]^ = {tci, W2} and [-'(A > G)]^ 
= [A > = {tC3, 104}. So, we do have K C [-i(A > G)]^, but G does not 

entail [A > G]^, against intuition. More in general, when conditional propo- 
sitions are rigidly determined for a given context across of possible epistemic 
states definable for the context, we do not have the following bridges that in the 
literature usually receive the name of Ramsey tests: 

(RT) For every model M, C B iff K C [A > R]“. 



and the corresponding: 

(NRT) For every model M, Ka is not included in B iff K C [-i(A > . 

We are assuming, of course, that t he operation K a does o bey preservation, for 
every K and A. As it is explained in [Arlo-Costa fc Levi 96 ^ (RT) is compatible 
with a differei it opera.ti ou which sometimes receives the name of imaging in 
the literature [ Lewis 7G} . The idea is to use the underlying /-function in order 
to construct the revision operation used in the Ramsey test. Define for every 

The latter option was suggested by the comments of an anonymous referee. 
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context set K and proposition A (and for a fixed selection function / across 
contexts) : 

Kij^A = U{f(A, w): w e K} 

Then we do have: 

(RT) For every model M, C B iff K C [A> B]^. 



The motivations for the adoption of # are unclear in the context of our anal- 
ysis. It seems that one certainly wants to have Preservation, a notion which is 
tightly connected with the use of probabilistic tests in the acceptance of con- 
ditionals. We argued at length above for preservation, by showing how it is 
anchored in the use of Bayesian conditioning. In order to see that preservation 
is violated by uses of # notice that = {tci, w^} ^ GCiA. Notice also that 

this is a violation of Slanaker’s constraint. In fact, relative to context G, the 
criterion mandates that tci should be the ‘closest’ world to 104 - a world that is 
not mereologically close to W4,. 

Is there any way of constructing conditional propositions for indicatives that 
respect both (a strict version) Stalnaker’s constraint and Preservation? In other 
words, is there an unified epistemic criterion for determining the content of 
selection functions, respecting both Preservation and Stalnaker’s constraint? In 
the next section I shall argue that it is indeed possible, as long as the propositions 
in question are dependent on epistemic context. 

3 Contextual Propositions for Indicatives 

If the function / used in the model picks up worlds according to measures of 
overall similarity, then it makes perfect sense to have one and only one selection 
function for each world in the model. If the function encodes mereological simi- 
larity, then it should be a matter of fact that either W4 is most similar (among 
A- worlds) either to tci or to W3. Perhaps one just can count atoms and say that 
ws is the ‘closest’ world to W4, period. 

It seems that in order to accommodate the Ramsey tests, f(A, tt;4) should 
yield tci, when evaluated at G, and W3, when evaluated at K. But this, of course, 
cannot be articulated in terms of overall mereological similarity. We should re- 
mind the reader here that when the /function was introduced we pointed out 
that we will adopt it for the sake of the argument in order to present Stalnaker’s 
constraint. We pointed out, nevertheless, that the function in question seemed 
an ontological intrusion in which otherwise is an epistemological account. Here 
is a proposal for retaining some of the role of the /-function in constructing 
propositions, while its motivation is presented in epistemic terms. 

In order to present the new proposal let’s go back to our motivation for 
supplementing conditioning with rankings of worlds which receive probability 
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zero after conditioning. Let’s focus on the initial context C. After one conditions 
on the proposition K (Gore is not elected president) the worlds wi and W 2 
receive zero measure. But we can still order these two worlds in different manners 
according to their degree of plausibility with respect to K. The formal tool needed 
here is what the philosopher W. Spohn calls an ordinal conditional function 
[ |jpohn 8T| . Here we just call them rankings. A ranking is a function k from the 
set of all interpretations of the underlying language (worlds) into the class of 
ordinals. A ranking is extended to propositions by requiring that the rank of a 
proposition be the smallest rank assigned to a world that satisfies: 

k{A) = «(w). 

The set of models corresponding to the belief set p{n) associated with a 
ranking k is the set {w: k(w) = 0}. There are various ways of understanding what 
a ranking is. Spohn’s own interpretation is that k{A) < k{B) means that A is 
less disbelieved than B. Spo hn’s formali sm is, in turn, an elaboration of Shackle’s 
notion of potential surprise ||Shack1e.filj . There are also various possible manners 
of updating rankings. Spohn’s methods for updating rankings are in accordance 
with probability theory where probability takes infinitesimal values. Here we will 
appeal to a method of revision proposed in the Artificial Intelligence literature by 
A. Darwiche and J. Pearl . Here is a presentation of the update rule as presented 
in [ barwiche fc Pearl 97) : 

/ ! Awr \ f K(rc) — k(A) if w 1= A 

(^•(^))(^) = \«(u;) + l otherwise 

The selection of this dynamics is based on purely pragmatic considerations. 
The reader will see below that it offers a convenient encoding of change for our 
representational goals. There are, to be sure, other feasible dynamics that can be 
used instead. Of those perhaps this is the simplest possible. The central feature 
of the dynamics that interest us is that if K is a set of points in rank 0, and A 
n K 0, after updating with any of the points in K, the closest A-points with 
respect to this point are also K-points. 

One can see G and K as beliefs sets associated with richer representations of 
context KG and k.k in such a way that p{kg) = G and p{kk) = K. 



kk 


Possible worlds 


0 


w3, w4 


1 


wl 


2 


w 2 



Notice that if one updates the ranking in question with the proposition { 104 }, 
the updated rank has W 4 in rank zero and W 3 in rank 1. On the other hand, one 
can have the following ranking giving the dynamic properties associated with G. 
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Kg 


Possible worlds 


0 


wl, w4 


1 


w3 


2 


w2 



And the result of updating this second ranking with preserves W 4 , in rank 
zero, but locates W\ in rank 1. The intuition here is that in order to determine 
the content of the /-functions f(A, w) for each world w in the context set K 
one should utilize the ranking that has p{kk) = K and then one should shift to 
kk • ({tu}) = Kw Then this ranking is used to calculate Ku,(A) = f(A, w). It is 
important to notice that under this definition the content of f(A, w) need not be 
a singleton. Under this point of view we are following the advice of David Lewis 
in [1~ -pvds rather than the account of selection functions offered by Stalnaker. 

Since the act of indicative supposition typically requires entertaining propo- 
sitions that are not belief contravening; we can start with a flat initial context 
C (determined by the points receiving non-zero measure in an initial measure) 
and use the Darwiche-Pearl algorithm in order to construct rankings naturally 
associated with each subset of C, including each point in C. So, if we proceed 
this way we would have: 



k'k 


Possible worlds 


0 


w3, w4 


1 


wl, w2 



and. 



k'q 


Possible worlds 


0 


wl, w4 


1 


w3, w2 



Even when these rankings have poorer information concerning belief con- 
travening worlds, they can perfectly well be utilized in order to carry out our 
analysis. The underlying intuition is simple. What is most plausible from the 
point of view of a world w depends on epistemic context. If one is in kk 
then the most plausible A-world from w^s point of view is W 3 . In other words, 
k'k • ({w 4 })(A) = {wz}. On the other hand, if one is in kg (kq), then the most 
plausible A-world from W 4 ’s point of view is Wi. In other words, • ({w 4 })(A) 
= {tci}. So, here we are adopting an epistemic strategy all the way down to 
the possible worlds that are components of context. When we have not only the 
context set, but also the corresponding function k for it, we can indeed define 
the /functions in terms of them. In the previous paragraphs I argued that suit- 
able functions k can be determined from a flat initial prior by only assuming 
the general properties of the Darwiche-Pearl algorithm. Moreover these proper- 
ties guarantee the satisfaction of Preservation. So, we can conclude that there 
is indeed an unified epistemic criterion for determining the content of selection 
functions, which respects both Preservation and Stalnaker’s constraint. 
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We are taking advantage also here of a feature of Spohn’s functions, namely 
that they are path-dependent. So, if w is both in K and in G, what is most 
plausible from the point of view of w depends on the plausibility orders for 
K and G. If one starts with K (and its plausibility order kk), one gets one 
estimation. If one starts with G (and its plausibility order kq), one might get 
another. And this seems quite natural. Path dependency is just another way of 
saying that the rankings retain memory of where they come from. One epistemic 
path from K to a ranking that has W4 in the zero level might be different from 
an epistemic path from G to a ranking that also has W4 at the zero level. This 
suggests the following recipe for defining contextual propositions. 

[A > = {w G G: f(A, w) C G, where p{k) — K and f(A, w) = 

K* * ({w;})(A)} 



Now we can have: 

(RT) For every context set G and K C G and k, such that p{k) = K; p{k • (A)) 
= C B iff K C [A > 



as well as a similar clause for negation of conditionals. This can be done 
even when Preservation is indeed obeyed. Rather than going into the details 
of the existence proof I shall focus on our example, showing how contextual 
propositions actually look like. 

Let’s first focus on G. If Kg (^g) is the ranking for G, then the / function 
is: f(A, wi) = {wi}, f(A, IV2) = {rci}, f(A, ws) = {rcs}, f(A, W4) = {rci}. 
Remember that this selection function seems difficult to motivate in terms of 
overall similarity (especially it is hard to motivate f(A, W4) = {u>i}). Now this 
selection function emerges naturally by applying iterated changes to the initial 
ranking kg- Actually each selection function emerges quite naturally even if one 
starts with a flat state C, after calcul^ing C • G • ({m}) for each w in G. 

So, [A > = {tci, W2, Therefore the proposition expressed by 

‘if Gore wins the popular vote this is a good outcome’ relative to kg is indeed 
entailed by G. And this aligns perfectly well with the fact that updating G with 
A yields the ‘good’ state wi. 

By the same token [A > = {rci}. And [-'(A > = {w2, 

W3, rc4}. So, K does entail [~'(A > G)]^’”^, which aligns with the fact that 
conditioning K with A, yields the ‘bad’ outcome Ws- 

We can formulate Stalnaker’s constraint more precisely in this setting as 
follows: 

^ Propositions are here indexed only by the initial context set C and the corresponding 
ranking, which in this case is kq - Adding G would be repetitious. As a matter of 
fact, the only parameters that matter in order to identify contextual propositions 
are the initial context set C, the current context set (in this case G) and the function 

• . The ranking for G can be naturally constructed in terms of C and •. 
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(Epistemic Dependency) If p{k) n A 0, then for every w G p{k) D A, 

p{k • ({te}) • A) = {to} and for every w G p{n) / p{n) D A, p{k • ({te}) • A) C 

p { k ) n A. 



The intuitive idea is simpler than what the notation suggests. We simply have 
that every world in a context set C is more plausible, from the point of view of 
C, than any other world outside C (without provisos). Indicative conditionals 
do express propositions, but they are delicately dependent on epistemic context. 
When an agent who is in G utters ‘if Gore wins the popular vote, this is a good 
outcome’ this utterance is ultimately related to the fact that when he conditions 
for the sake of the argument his view with A (the antecedent) he finds himself 
in an outcome he considers good (Gore wins the vote and the election). And 
when an agent in K utters the same conditional, this really means that from his 
point of view conditioning with A puts him in a ‘bad’ state (where Gore is not 
elected, but wins the popular vote). An important philosophical problem is to 
what extent the postulation of these propositions can be made compatible with a 
reasonable account of communication. Is this view too fine-grained in such a way 
that communication is made implausible? This requires a detailed analysis that I 
do not intend to offer here. My main goal was to show that a theory of contextual 
propositions for conditionals is possible. I wanted also to illustrate with example 
how these propositions are calculated in simple cases. With regard to the deeper 
philosophical problem of communication I shall say only in passing that a view 
that defends the contextual account has to show how we are capable of using 
utterances in order to elicit contexts of communication and to partially elicit the 
points of view of interlocutors. My own philosophical preferences lean towards 
the view that it is better to see conditionals as cognitive carriers (the expression 
is Ramsey’s) than to see them as carrying propositions. But a purely epistemic 
theory of contextual propositions is possible. In this article my main goal was 
to show the kind of epistemological commitments that one should contract in 
order to build it up. 

Acknowledgements. I would like to thank the useful comments of two anony- 
mous referees. 
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Abstract. This paper presents a technique for making neural networks 
context-sensitive by using a symbolic context-management system to 
manage their weights. Instead of having a very large network that it- 
self must take context into account, our approach uses one or more small 
networks whose weights are associated with symbolic representations of 
contexts an agent may encounter. When the context-management sys- 
tem determines what the current context is, it sets the networks’ weights 
appropriately for the context. This paper describes the approach and 
presents the results of experiments that show that our approach greatly 
reduces the training time of the networks as well as enhancing their per- 
formance. 



1 Introduction 

Neural networks are well known for their ability to separate continuous numerical 
data into a finite number of discrete classes. This functionality makes them 
a perfect candidate for converting real-valued data into symbolic values for a 
symbolic system. We will call such symbolic values “linguistic values” , borrowing 
the term from fuzzy logic (e.g., m)- 

A problem arises though, in real-world situations where linguistic values are 
highly context-dependent. For example, an underwater agent may convert a 
depth of 5 meters into TOO-DEEP while in a harbor, but later, when it finds 
itself in the open ocean, 5 meters may be classified as NOMINAL. Thus, if 
we wanted a neural network to convert an agent’s depth to a linguistic value 
we would need to encode context into the network, which would entail adding 
nodes and connections to the network. It is easy to see that as the number 
of contextual features and the number of possible contexts increases this would 
become unmanageable: in order to be context-sensitive, the network’s size would 
make it impractical. 

On the other hand, if a neural network could be constructed and trained 
for use within a single context, it could be much smaller, and it would be fast 

* This work was supported in part by the United States Office of Naval Research 
through grants NOOOl-14-96- 1-5009 and NOOOl-14-98-1-0648. The content does not 
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to train and highly accurate. A large number of these simpler networks could 
then do the work of the larger, more complex network. The problem would then 
become one of somehow identifying which network to use in which context. 

We have developed an approach to making neural networks context-sensitive 
that takes this second approach. We solve the problem of deciding which net- 
work to use by making use of prior work on a context-management system that 
explicitly represents context an agent might reasonably be expected to encounter 
[OJS]. With each contextual representation, or contextual schema, is stored in- 
formation useful for the agent in the context. For an agent that uses (or that 
is) a neural network, contextual schemas contain the weights appropriate for the 
network to use in the context represented by the schema. Rather than encoding 
all the features of the context into the neural network’s weights, the context- 
management system handles the problem of diagnosing the current context. This 
frees the network to encode only those features having to do with categorizing a 
value within that context. This greatly decreases both the complexity and train- 
ing time of the neural networks. In addition, the classification error rate is on par 
with the fuzzy rule-based system currently used by our system for classification. 

There has been some prior work aimed at developing context-sensitive neural 
networks. For example, Henninger et al. |Z1 have developed context-sensitive 
neural networks that use context to determine which network is to be used. 
Their work, however, makes use of a very simple form of context, i.e., an agent’s 
distance from a location. In contrast, our approach relies on a much richer notion 
of context that incorporates numerous environmental factors. This approach not 
only allows us to better tailor the networks to the situations in which they will be 
used, but also to remove most of the contextual information from the networks 
themselves, thus reducing their complexity. 

In the remainder of this paper, we discuss this approach in detail. Section 2 
discusses our domain, autonomous underwater vehicle (AUV) control, and the 
agent, Orca, within which our network resides and that provides the context 
management functionality. Section 3 presents two neural network designs, one 
using the context-management system and one that does not, that perform one 
classification task important for AUVs, depth categorization. Section 4 presents 
the experiments we used to compare these two approaches, including the results 
of the experiments. Section 5 describes how the neural network is integrated into 
the context-management system, and Section 6 concludes and discusses future 
work. 



2 AUVs, Orca, and Depth Management 

Autonomous underwater vehicles are untethered submersible robots that are 
capable of carrying out untended underwater missions. They are useful for a 
variety of tasks in oceanography, ocean engineering, aquaculture, industry, and 
defense f^. One of the major advantages of AUVs is their ability to operate 
in an area without a human surface presence, which allows them to perform 
long-term missions without constant human supervision. AUVs can also operate 
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under hazardous conditions, such as within minefields, under ice, and during 
inclement weather. They can also be combined into multi-AUV systems that 
can perform tasks such as autonomous oceanographic sampling and surveying 
Figure n shows the two EAVE (Experimental Autonomous VEhicles) AUVs 
that were used for developing AUV technology. 




Fig. 1. Two EAVE vehicles on a support barge. Used by permission of the Autonomous 
Undersea Systems Institute (AUSI), Lee, NH. 



The current state-of-the-art of AUV hardware technology is such that com- 
petent AUVs can be built and fielded. However, although there has been much 
recent work on intelligent control of AUVs (e.g., 00 ), the competent control 
software needed to carry out, successfully, a complex, autonomous, possibly long- 
term mission is largely lacking. For this, an AUV must have sophisticated artifi- 
cial intelligence (AI) control software that can not only autonomously plan and 
perform the missions, but that can also respond appropriately to the unexpected 
events that are sure to arise within the highly dynamic undersea environment. 

The Orca project m has the goal of creating an intelligent mission con- 
troller for long-term, possibly multiagent, ocean science missions. Orca, which is 
currently in development at the University of Maine, is a context-sensitive agent 
that will be able to recognize the context it is in and behave appropriately. 

Orca’s context-sensitivity is conferred by its context-management module, 
ECHO (Embedded Context Handling Object) Orca represents all contex- 
tual knowledge as contextual schemas (c-schemas) fS]. Each c-schema explicitUj 
represents a context, that is, a recognizable class of problem-solving situationsP 
A c-schema contains a wide variety of contextual knowledge useful for the agent 



^ See [r 3) for a more complete description of our definition of context. 
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as it operates within the context represented. For example, c-schemas contain 
knowledge about: the expected features of the context; context-specific semantic 
information (e.g., the meaning of fuzzy linguistic terms M) ; how to cope with 
unanticipated events (how to recognize them, diagnose them, evaluate their im- 
portance, and handle them); goal priorities in the current context; which actions 
are appropriate for which goals in the context; and the appropriate settings of 
behavioral and perceptual parameters for the context. 

C-schemas are stored in an associative long-term memory @ that, when pre- 
sented with features of the current situation, returns the c-schemas that most 
closely match it. A process of diagnosis then occurs to determine which of these 
evoked c-schemas actually are germane These are then merged to form the 
context object, which is a coherent picture of the current context. The knowl- 
edge in this object can be used by the agent to quickly decide how to behave 
appropriately for the context. 

As an example of the usefulness of this approach, consider how an agent might 
determine the appropriate response to a catastrophic event such as a leak. For 
such a thing, there will be very little time in which to decide on a response, 
so complex reasoning after the leak may be impossible. Yet the appropriate 
response is context-specific. In a harbor, the AUV should probably land on the 
bottom and release a buoy, since that will avoid collisions with surface traffic. 
In the open ocean, however, landing might be disastrous: the bottom may be 
below the crush depth of the vehicle. Instead, since there would be very little 
likelihood of a collision, the appropriate response would be to surface and radio 
for help. If the agent always maintains an idea of what its current context is, 
then it can automatically take the appropriate response. 

For the purposes of this paper, we are concerned with context-sensitive per- 
ception, that is, appropriate classification of sensory inputs. As an example, 
consider the AUV’s depth envelope, that is, the range of depths that are al- 
lowable. This is highly context-sensitive. If the agent finds itself in a harbor, it 
should tighten its envelope to keep it away from the surface, where it is in danger 
of being hit by traffic, and above the bottom, where it may encounter debris that 
could ensnare it. If, on the other hand, the agent finds itself in the open ocean, 
it can loosen its envelope; the probability of surface traffic is minimal, so the 
agent only has to worry about staying above its crush depth. 



3 Neural Networks for Depth Classification 

Elsewhere, we have proposed a solution to the problem of context-sensitive clas- 
sification of sensory data that was based on a fuzzy rule-based system mi. 
However, we are interested in a mechanism that can learn from the agent’s own 
experience or by being presented with training examples, since the AUV’s oper- 
ating conditions may change, and it may be difficult or impossible to obtain the 
fuzzy rules from human experts. Consequently, we have begun investigating the 
feasibility of using neural networks for this task. 
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If the neural network is going to classify (e.g.) depth, context must be taken 
into account. This can be done in two ways: either a large network with a large 
number of inputs can be trained with data from instances of all the contexts the 
AUV may encounter, or numerous smaller networks can each be trained with 
data from instances of a single context. The former kind of network must not 
only classify the agent’s current depth, but also implicitly determine its current 
context in the process. To achieve this, the network will have to be provided 
not only with the current depth, but also with a large amount of environmental 
data that will help it determine the current context. The idea behind the smaller 
networks is that they will each specialize in classifying depth within a specific 
context. It will be up to the context-management system to determine which 
network is the appropriate one for the current situation. Since these networks 
are so specialized, they will require a few inputs, chiefly the agent’s current 
depth. 

The larger network will be very large indeed, and although it may be ade- 
quate for all contexts, it is unlikely it will be especially good for any particular 
one. Training time will be long for the network. The smaller networks will each 
be simpler to train, and they will obviously be highly-tailored to depth catego- 
rization for the context in which they are used. 

Our approach is conceptually the latter. However, instead of using numerous 
small networks, we have a single network whose weights are set based on the 
context. The weights for the network that are appropriate in a given context 
are stored in the c-schema representing that context. When that context is rec- 
ognized, the weights are retrieved from the c-schema and given to the neural 
network. During the context, the network then behaves as a highly-specific net- 
work. When the context changes, any changes to the weights, e.g., from learning 
sessions within the context, are then stored in the c-schema for use the next time 
the agent is in that context. A new c-schema is found for the new context, and 
its weights are loaded. This approach is particularly useful in a system such as 
Orca, in which the agent’s context is already being recognized and represented. 

In the next section, we discuss experiments we performed to determine if a 
large number of small networks, such as is effectively used in our approach, is as 
good as or better than a large, multi-context network. In the following section, 
we discuss how this can be integrated into Orca. 



4 Experiments and Analysis 

There are several criteria for determining if the smaller networks are better suited 
for our task. First, the smaller networks would have to perform the classification 
task at least as well as the larger network. Second, since we would be training 
numerous smaller networks, they would have to train much faster than the larger 
network. Finally, we would have to show that the size of the larger network 
grew at an unacceptable rate as more contexts were added to the system. In 
this section we will show that the set of small networks classify better than the 
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single large network. We will then show, in a less formal manner, that the smaller 
networks train faster and that the larger network grows unmanageably large. 

For the sake of the following experiments we made a few decisions regarding 
our networks. All of our networks were two layer feed- forward networks [ 3 ] that 
had 5 output nodes. Each of these output nodes stood for one of our linguistic 
values: TOO^SHALLOW, SHALLOW, NOMINAL, DEEP, TOO^DEEP. The 
output node with the highest value was the classification that we chose. Our 
smaller networks each had five nodes in their hidden layer. The number of hidden 
nodes was determined by starting with a network with two hidden nodes this 
network was trained with two-thirds of the training data. Next, the performance 
of this network was evaluated using the final third of the training data. A node 
was added and the network was retrained and tested. This process was repeated 
until an acceptable level of performance was reached. This process is a form 
of cross-validation|B|. The number of nodes in the hidden layer of the larger 
network varied with the number of contexts, but was also chosen via cross- 
validation. Training was done with the Levenberg-Marquardt algorithm^]. All 
data was collected via Matlab programs running on an 1.5 GHz Intel Xeon PC 
under the Linux operating system. 

When we set out to test the classification performance of our networks we 
decided to keep the number of contexts small in order to simplify the training 
of the larger network. Thus, we chose to test the performance of the networks 
at classifying depths within a harbor, on a shoal, and in the open ocean. In 
order to implement the larger network we had to determine the most salient 
features of the contexts and use them as parameters. We chose the following 
seven parameters: depth, distance from shore, water column depth, density of 
fish, density of debris on the bottom, and the density of surface traffic. The 
training data for all of these parameters was then generated randomly via a 
normal distribution around the accepted values for each parameter. Since we 
wanted these networks to perform as well as our current fuzzy system, we used 
it to generate the correct classifications for our training data. 

Next, the large network was trained with the entire training set while the 
smaller networks were trained with the data from there respective contexts. Upon 
completion of training, the networks were tested with more randomly generated 
data. The rate of classification error can be seen in table 0 Performing a t-test 

with Hq . ^large-net — l^small^nets and H\ . ^large^net P f^small^nets we get a t 

statistic of 2.218 and a critical value of 1.649 when a = 0.05. Thus, we reject 
the null hypothesis and we can say that the smaller nets generate, on average, 
fewer errors. 



Table 1. Rate of Classification Errors 





Harbor 


Shoal 


Ocean 






Large Network 


0.0216 


0.0197 


0.0202 


0.0205 


0.000008 


Small Networks 


0.0378 


0.0118 


0.001 


0.0169 


0.00008 
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Another important aspect of the networks is how long they take to train. 
Each of the smaller networks take about the same amount of time to train, 
thus as we increase the number of smaller networks, the training time increases 
linearly. Consequently, given the one-to-one correspondence between networks 
and contexts in this approach, training time overall is linear in the number of 
contexts. It is intuitive that the smaller networks should train faster, but we have 
to verify that as the number of contexts is increased that the training time for 
the larger network increases faster than the training time for the set of smaller 
networks. Table Q and figure ^ show how the training times for the networks 
grow as the number of contexts increase from one context to six. It is apparent 
that the training time for the larger network is increasing much faster than that 
of the smaller networks. We attribute this to the fact that as the number of 
contexts increases, the larger network grows both in size and in the number of 
inputs. It also requires more training data to allow it to distinguish between 
contexts. 



Table 2. Training times as the number of contexts is increased 



# of Contexts 


Large Network 


Small Networks 


1 


12 seconds 


11.5 seconds 


2 


1 minute 9 seconds 


24 seconds 


3 


2 minutes 30 seconds 


37 seconds 


4 


6 minutes 23 seconds 


49 seconds 


5 


10 minutes 16 seconds 


62 seconds 


6 


18 minutes 37 seconds 


77 seconds 



Our final test was to find out how fast the number of nodes in our larger 
network increased as the number of contexts increased. We wanted to know this 
because as this number increased, we would increase the runtime of the network 
and the training time of the network. The number of nodes needed was calculated 
through simple cross- validation jO], as discussed earlier. Table 0shows the results 
of this experiment. It appears that the network is growing in an exponential 
fashion. It should be apparent that this will soon cause the larger network to 
become both inefficient and impossible to manage. 

This set of experiments demonstrates several things. First it shows us that 
the smaller networks perform better at the classification task than the larger 
network. Next, we saw that the training time of the set of smaller networks 
increases linearly as the number of contexts increases and the training time of 
the larger network appears to increase at a much faster rate. Finally, we saw 
that the number of nodes in the larger network exploded as more contexts were 
added. Taking this all into account, the logical choice is to use the set of smaller 
networks for our classification tasks. 
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_ * , . , Training Times with Respect to the Number of Contexts 

Training TimeCseconosj 




Number of Contexts 



Fig. 2. Graph of training time as contexts are increased 
Table 3. Number of nodes needed to classify depth 



# of Contexts 


# of nodes in hidden layer 


1 


5 


2 


8 


3 


20 


4 


32 


5 


63 


6 


112 



5 Embedding the Network within a Schema 

As previously discussed, Orca’s context-management system explicitly represents 
contexts in the form of contextual schemas. A c-schema incorporates everything 
that is deemed important about a given context. This should include information 
about the operation of neural networks in the context represented. This can be 
done in one of two ways. 

As mentioned earlier in this paper, the only difference between all of the 
smaller networks was the weights. This static architecture makes it very easy to 
store the weights in a c-schema. The simplest, although slightly short-sighted, 
way to make these networks context-sensitive would be to save the weights in 
a simple list. This makes it very easy for the context manager to manipulate 
the weights. When the weights are needed they can simply be read in order and 
slotted into the appropriate spot in the neural network. Likewise, if the context 
manager deems it necessary to retrain a network the new weights can be easily 
changed. 






Context-Sensitive Weights for a Neural Network 



37 



The problem with this method is that if in the future we decide to change the 
structure of the network we would have to rewrite all of our network handling 
functions. To solve this problem, we could encode the structure of the network 
into the list. By nesting lists we could generate any feed- forward network struc- 
ture while maintaining a simple representation. Figure 0 shows how this could 
work. The idea behind this scheme is that each layer is represented as a list 
within the master list. Then the set of each node’s input weights are stored 
within the layer list. We intend to use this method in our system, although we 
will have to be alert for performance degradation from the greater complexity it 
entails. 




( ( (wl w3 ) (w2 w4 ) ) 
( ( w5 w7 ) ( w6 w8 ) ) 

( (w9 wlO ) ) ) 



Fig. 3. Converting a network to a list 



6 Conclusions 

This paper presents a technique that allows a neural network to be context- 
sensitive without becoming unmanageable. This is achieved by allowing the con- 
textual aspect of the problem to be handled by a symbolic context-management 
system. By using this technique we are able to to use the efficient classification 
and learning algorithms associated with neural networks without the explosion 
in network size that comes with the addition of more contextual data. 

While we have only shown this technique to work for our classification task, 
we believe that it could be applied to many other neural network applications 
where the network’s task and/or domain can be partitioned naturally into con- 
texts in which the network must operate. In the future, we hope to show that 
this technique extends to other real world domains and also maps to other neural 
network architectures. 

This project has raised many questions that we hope to look at in future 
work. First, we have to implement and test a method for retraining networks 
during a mission. One possibility is for the context manager to recognize that it 
is in a training context (by retrieving a corresponding c-schema) and to allow the 
weights to change while in that context. Another avenue that we are currently 
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exploring, not only with respect to these networks but with our entire notion 
of context, is how to merge disparate contextual information. For example, an 
AUV may be in the context of a harbor and also a rescue mission. The harbor 
context may tell the AUV that the bottom of the harbor is too deep while 
the rescue context will want to eliminate the idea of a depth envelope entirely. 
Finally, another thing that we have to look at is whether we have lost anything 
by not using the larger network. This paper shows that it does not classify as 
well, but we may be able to draw other information from it such as when to 
merge contexts, or we may be able to use it as an indicator for when context is 
changing. 
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Abstract. Causes are defined informally to be events which are both 
contextually necessary and contextually sufficient for their effects. A for- 
mal, logico-pragmatic, definition is then given and discussed. 



1 Introduction 

We commonly talk of one event causing another, and think of the events as be- 
ing connected; of there being a bond or nexus between them. However, Hume 
observed ^ Book I, Part HI] that if we concentrate on a pair of such events in iso- 
lation from the rest of the universe, then the only relations that can be detected 
between them are succession and contiguity; the occurrence of the “cause” is 
followed by the occurrence of the “effect” . Consequently he suggested that when 
we talk of an event e causing an event e', at least part of what we mean is that e 
is an event of type A and e' is an event of type B, and that events of type A are 
always followed by events of type B. The individual events are thus subsumed 
under an inductively learned regularity. Hume appears to have considered this 
sufficiency account to be equivalent to a necessity account: 

[Wje may define a cause to be an object followed by another, and where all 
the objects, similar to the first, are followed by objects similar to the second. 

Or, in other words where, if the first object had not been, the second never had 
existed. §VII, Part II] 

As Lewis P] observes, Hume’s necessity account is counterfactual in nature, 
as it involves implicit reference counterfactual reference to the context in which 
the purported cause occurs. Hume can thus be seen as proposing a pragmatic, 
or context-dependent, account of causation, according to which the concept has 
no independent reality, but rather should be seen as a useful abstraction from 
experience and particular contexts. 

This paper proposes a formal theory of causation along these lines. The the- 
ory is based on a common sense theory of events ^ which takes their defeasible, 
context-dependent, nature seriously. Event types are defined by giving their pre- 
conditions and effects; the former being regarded as necessary conditions for the 
success of events of this type, and the latter being the effects which regularly 
follow them should they succeed. The idea is that events (tokens of the type) 
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normally succeed if their preconditions obtain on occurrence; that their precon- 
ditions are, on occurrence, normally also sufficient for their effects. This idea is 
unpacked by considering the context in which an event occurs, its context of oc- 
currence, which is typically incompletely specified or partial. The preconditions 
should be such that they are sufficient in most contexts, but will typically not 
be sufficient in all of them. The success of an event is then inferred by default 
whenever doing so is consistent with its context of occurrence. For example the 
preconditions of a blocks-world Pickup operator are that the robot’s hand is 
empty, and that the block is on the table and is clear (has nothing on top of 
it), and the effects are that the robot is holding the block, its hand is no longer 
empty, and the block is no longer on the table or clear. If a Pickup event occurs 
in a context in which its preconditions are all true and its success is consistent 
with the context, then the Pickup event should succeed, and its effects should 
follow. This view of events can be used to define causation as follows. An event 
e is said to be contextually sufficient for (a fact or other event) (f> in context c 
iff e succeeds in c and 4> is among e’s effects. Event e is said to be contextually 
necessary for 0 in c iff removing e from c would also remove 4> from c. A (direct) 
cause can then be defined to be an event which is contextually necessary and 
contextually sufficient for its effect. Thus, for example, the Pickup event is said 
to be the cause of the block’s being held in a given context iff the event succeeds 
in the context and, had it not occurred in the context, the block would have 
remained on the table. 

The formal theory is expressed in a three-valued language called the Causal 
Temporal Calculus, which is presented in the next section. The theory of events 
is then summarized in Section El and the theory of causation is presented and 
discussed in Section E 



2 The Causal Temporal Calculus 

The Causal Temporal Calculus {CTC) is a straightforward extension of the Tem- 
poral Calculus (TC) Q, which in turn is based on Kleene’s strong three- valued 
language |E|. This provides a means for reasoning demi-classically with partial 
information and classically with complete information. Accordingly, the truth 
conditions for the propositional operators return a Boolean truth value wherever 
possible. Thus the sentence is true if 4> is false, false if 4> is true, and is unde- 
fined otherwise. And the sentence 4>Af is true if 4> and %p are both true, false if ei- 
ther is false, and is undefined otherwise. Further operators, such as inclusive and 
exclusive disjunction, can be defined as in classical logic, ip =Df -'(-'<(> A ^ip) 
and 4> I xp =Df {(py Ip) A -^{(p A ip), while a sentence of the form (p = ip is true if 
4> and Ip have the same truth value (true, false, undefined) and is false other- 
wise. This approach is extended to the first-order case. Atomic sentences may 
be true, false, or undefined. And a universal sentence Wxcp is true if (p is true for 
all assignments to x, false if <p is false for one such assignment, and is undefined 
otherwise. The existential quanitifier 3 is then defined as in classical logic. 
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The undefined operator ‘U’ is added to Kleene’s language in order to repre- 
sent and reason about partiality. The sentence \J4> is true if f is undefined (is 
neither true nor false), and is false otherwise. This operator is used to define the 
classically-valued operators T, F, ^ and = as follows: 

T<() =Df “'(U^ V -!(()), F(j) =Df V <()), (j) ^ -iT((> V Ti/j, 

4> = ip =Df (T(() A Jf}) V {F4> A Ff)) V (U<() A U^). 

Thus, for sentences (j> and ip: T</) is true if f is true, and is false otherwise; Ff 
is true if </> is false, and is false otherwise; and f ^ ip is true if ip is true or cp is 
not, and is false otherwise. 

In order to represent time, and thus change, a temporal index is added to 
each atom of the underlying language. A domain atom is an atom of the form 
r(ui,... ,Un){t), where the Ui are terms denoting objects in the domain, and 
term t denotes a time point. Intuitively, a domain atom r{ui, . . . ,Un){t) states 
that the relation r holds between the objects ui, . . . , at time t, that the fact 
r{ui, . . . , Un) is true at t. For example. Axiom C3) in Table lEI states that object 
01 is at location LI at time 1, etc. 

In order to represent change, events are added as a separate sort. Thus an 
oecurs atom is an atom of the form Occ(e)(f), stating that a token of event type 
e occurs at time t; for example, the occurs atom in Axiom states that the 
event consisting of 01 moving from location LI to L2 occurs at time 1. 

In order to represent inertia, facts are added as a fourth sort. Formally, a 
fact is the atemporal component, a, of a domain atom a{t). 

Finally, in order to define causation, TC is extended to CTC by adding a log- 
ical truth operator on TC-sentences and allowing quantification over them. Thus 
if ((> is a sentence of TC, then Ocp is a sentence of CTC which states that 0 is a logi- 
cal truth of CTC. And a TC- formula atom is an atom of the form r^{(pi , ... , 
where the (pi are TC-formulas; for example Cause{Occ{e){t), Occ(e')(t-Fl)) states 
that Occ{e){t) is the Cause of Occ{e'){t 1). 

The five sorts of CTC are identified by the following letters: D for domain 
objects, T for time points, and E for events. 

Definition 1 The vocabulary of CTC consists of the symbols 'U’, ‘A’, V’, 
‘=’, and the following, mutually disjoint, countable, sets of symbols: 

— Cd, Ct, Ce (constants of sorts D, T and E), 

— Vd, Vt, Ve, Vf, V 4 , (variables of each sort), 

— Ed, Et, Ee (function symbols of each arity n > 1 of sorts D, T, E), and 

— Rd, Re, Re, R<i> (relation symbols of each arity n > 0 of sorts D, E, F, 

and T>). 



Definition 2 The terms of each sort S are defined as follows: 

— If S is of sort D or T then 

terms = Cs U Vs U {/(ui, . ■■ ,Un) : n-ary f G Fs,Ui G terms}- 
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— terniE = U Fe U {/(ui, . . . ,u„) : n-ary f e Fg, Wi G termo}- 

— ter nip = Cp U Vp, where 

Cp = {rE,(ui, ... ,Un) ■■ n-ary rp G Rp^Ui G termp}- 

— term^, = C 4 , U Fp, where =TC is defined in Definitions^. 



Definition 3 TC is the minimal set which satisfies the following conditions. 

— Ift,t' G termp then t < t’ € TC. 

— If S is of sort D, E or F, ui, ... ,Un & terms, is an n-ary relation symbol 
in Rs, and t G termp, then rs{u\, . . . , u„)(t) G TC. 

— If V G Vp and t G termp then v{t) G TC. 

— If S is of sort D, T, E or F, and u, u' G terms, then u = u' G TC. 

— If G TC, then G TC, \icj) G TC, and Atp) G TC. 

— If S is of sort D, T, E or F, v G Vs and G TC, then 'ivfi G TC. 

The members ofTC are called formulas (ofTC). Those formulas in which no 
variable occurs free are called sentences (ofTC). 



Definition 4 CTC is the minimal set which satisfies the following conditions. 

— TC C CTC. 

— If (fi, ... ,(j)n G terrn^ and r,p is an n-ary relation symbol in R,p, then 
rp,{(j)i,...,4>ri) gCTC. 

— If (f) G TC and is the result of substituting zero or more variables in Vp, 
for sub-formulas in fi, then G CTC. 

— If fijtp G CTC, then G CTC, C<f> G CTC, and {4> Aip) G CTC. 

— If S is any sort, v G Vs and G CTC, then t/vfi G CTC. 

The members of CTC are called formulas (of CTC). Those formulas in which no 
variable occurs free are called sentences (of CTC). 

Models of CTC consist of a set D of domain objects, a set S of event types, a 
temporal frame (T, Up) (where T is a set of time points and IZp is the before- 
after relation on T), and interpretation functions for terms and relations. For 
simplicity, time is assumed to be discrete and linear. The denotations of terms are 
always defined and do not vary with time. By contrast relations are interpreted 
by time-dependent, partial, characteristic functions; thus the interpretation of 
relations may be partial and may vary with time. 

Definition 5 A model for CTC is a structure M = i(D,£,{T ,IZp),T,IZ,V) , 
where: 

— D, £ and T are mutually disjoint, countable, non-empty sets, 

— IZp is a binary relation on T which is discrete and linear, 

— T = {Td,Tt,Te) , where, for each pair {S,S) G {{D,V),{T,T),{E,£)}, 

Ts is a set of n-ary functions of type S for each n>\. 
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— TZ = {TZd,TZe,TZf,TZ^>), where for each pair (S,S) G {{D,'D),{E,S), 
{F,Cf),{^,C$)}, TZs is a set of partial n-ary functions of type 5" ^ 
{ true, false} for each n> 0, 




tion function such that 



• Vf : Cs^S andV§ : Fg Ts for {S,S) G {{D,V) , {T,T) , {E,S)} , 

• Vp : Cf ^ Cf and V2 ■ C,f ^ are identity functions, 

• Vi RsxT^TZs. 



Terms are interpreted in the standard way. 

Definition 6 A variable assignment for a CTC-model is a function g = {gDidT, 



S. For CTC-model M, interpretation function V and variable assignment g for 
M, the term evaluation function Vg is defined, for each CTC-term u and sort S, 
as follows 



The truth and falsity of sentences can now be defined by means of the inter- 
mediary notions of satisfaction and violation. 

Definition 7 Let M = {T>,£, {T,TZr),T,TZ,V) be a CTC-model, g be a variable 
assignment for M , and 4> be a CTC-formula. Then g satisfies cf in M (written 
M,g\=(f>) or violates cf in M (written M,g=\(j)) according to the clauses given 
in Table O’ where the notation g ^ g' is used to indicate that the variable as- 
signments g and g' differ at most on the assignment to variable v. A formula cf 
is true in a CTC-model M (written M \= cf) if M,g\=cf) for all variable assign- 
ments g. A formula cf is false in M (written M ^cf) if M,g=\ cf for all variable 
assignments g. 

It is straightforward to prove (by means of a parallel induction on the struc- 
ture of CTC-formulas) that, for any model M and sentence cf, either M \= cf, 
or M 1= -10, or M \= U0. Consequently, as in classical logic, it is sufficient to 
consider the truth relation on sentences of CTC. 

Definition 8 A CTC-model M is said to be a model of a sentence cf if M \= cf. 
Similarly M is said to be a model of a set of sentences O (written M \= O) if 
M \= cf for every cf G O. A set of sentences 0 semantically entails a sentence cf 
(written 0\= cf) if all models of 0 are also models of cf. 




( Vs{u) 
Vg(w) = < gs{u) 



ifuG Cs, 
ifuG Vs, 



V|’(/)(Vg(wi), . . . ,Vg{un)) otherwise. 



3 The Theory of Events 

The theory of events begins with primary events, which can be thought of as 
defeasible strips events |2| . Primary event types are defined by specifying their 
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Table 1. Satisfaction and violation conditions for CTC (see Definition^) 

M,fl 1= t < t' iff Vg(t')) e 7^T 

M,g^t<t' iS {Vg{t),Vg{t')) i TZt 
M, gr 1= M = u' iff Vg{u) is Vg(w') 

M, p =1 u = iff Vg{u) is not Vg{u) 

M,g\= rs{ui, . . .,u„){t) iS Vs (rs ,Vg{t)){Vg{ui) , . . . ,Vg(un)) = true 
M,g =\ rs{ui, . . ■,Un){t) iff Vs (rs, Vg(t))(Vg(Mi), . . . , Vg(Mn)) = false 
M,g\= v{t) iff M,g 1= Vg{v){t) 

M,g iff M,g =1 V9(r)(t) 

M,g ^ r^{ui ,. . . , Un) iff TZ<f{r,p){Vg{ui ), . . . , Vg(u„)) = true 
M,g ^r 4 ,{ui,...,Un) iff 7?.<j.(rg,)(Vg(ui), . . . , Vg(u„)) = false 
M,g\=viSM,g\= Vg{v) 

M,g iS M,g =j Vg(u) 

M, g 1= Dtjj iff M' , g' \= Ip for every M' and g' 

M,g =\ iff M' , g' for some M' and g' 

M,g'^^ipiQM,g^tp 

M,g=\^fj iff M,g\=pi 

M, g \= y]pi iff neither M, g \= ip nor M, g =| V' 

M, g =1 \Jip iff either M, g \= ip or M, g ^ip 

A4", 5 h V' A X iff Af, 5 l= V' and M, gr h X 

g- H V' A X iff Af, 5 H V' or M, g H X 

M, g \= Vvip iff M, g' \= Ip for all g' such that g ~ g' 

M, g =1 Vvip iff M, g' =\ Ip for some g' such that g ~ g' 



preconditions and effects. The preconditions can be thought of as necessary con- 
ditions for the success of an event of this type, and the effects as the invariant 
effects of the event; examples are axioms m and in Table 0 It is assumed 
that these definitions are natural in the sense that preconditions and effects are 
constructed entirely from fact and event atoms, that preconditions do not in- 
clude posterior conditions, and that effects do not include prior effects. Thus it 
is assumed that in any definition instance Pre{e){t) = (p the sentence (p does not 
contain references to time points after t, and that in any instance Eff{e){t) = <p 
the sentence (p does not contain references to time points before t. The precondi- 
tions of a primary event should normally be sufficient, on its occurrence, for its 
effects, but will typically not logically guarantee them. Call the context in which 
an event occurs its context of occurrence. Then the preconditions of a primary 
event should be such that they are sufficient in most contexts of occurrence, but 
need not be sufficient in all of them. In order to represent this, success atoms are 
introduced. Intuitively the success atom, Succ{e){t), states that event e succeeds 
at time t; that is, that e occurs at t, its preconditions are true on occurrence, 
and its effects are true at t-|- 1. This is stated by the success axiom, Axiom m in 
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Table 2. The theory of events, &e 

Ve, t(Succ{e){t) = T(Occ(e)(t) A Pre{e)(t) A Eff{e){t + 1))) (1) 

Ve, t(Fail{e){t) = {Occ{e){t) A ^Succ{e){t))) (2) 

ye, e ,t{Inv{e, e'){t) -ilnv{e ,e){t)) (3) 

ye,e ,t{Inv{e,e'){t) [Occ{e){t) A Occ{e){t))) (4) 

Ve, e ,t{{Inv{e, A Succ{e'){t)) 3e” {Inv{e” ,e'){t) A S'?rcc(e”)(t))) (5) 

ye,e ,t{Inv*{e,e'){t) = {Inv{e,e'){t)y {Inv{e,e'){t) A Inv* {e” ,e){t)))) (6) 

Va, t{Inert{a){t) = {a{t) = a{t + 1))) (7) 

ya,t{Change{a)(t) = ^Inert{a){t)) (8) 



Table □ Note that the presence of the truth operator in the axiom ensures that 
the Succ relation is bivalent; that is, the sentence ye,t{Succ{e){t) V ^Succ{e){t)) 
is true in any model of the axiom. An event is said to fail if its preconditions do 
not hold on occurrence, or its effects do not result; Axiom P). The success axiom 
is intended to be used in order to infer change. Given Occ{e){t) and Pre{e){t), 
the success assumption, Succ{e){t), should be made whenever it is consistent to 
do so (whenever it is consistent with e’s context of occurrence), and the axiom 
used to conclude Eff{e){t + 1). 

Primary events have the defeasibility of natural events, but are unlike nat- 
ural events in that their effects, when successful, are invariant. But typically 
events also have context-dependent effects; for example, if block B2 is on block 
B1 when B1 is moved, then an additional effect of moving B1 is that B2 moves 
also. This limitation is overcome by introducing secondary events. Secondary 
events are defeasible strips events which are invoked by other (primary or sec- 
ondary) events in appropriate contexts, and their success depends on that of the 
events which invoke them. A common sense event can thus be thought of as a 
tree-structured object whose root is a primary event, and whose effects are the 
combined effects of all successful events in its invocation tree. 

Invocations are represented invocation atoms, thus the atom Inv{e,e'){t) 
states that event e invokes event e' at time t, and by invocation axioms of 
the form: Ve, e',t((Occ(e)(t) A^(e, e')(t)) — > Inv{e,e'){t))-, where ^(e, e')(t) is a 
formula which distinguishes those contexts in which e invokes e' at t; examples 
are axioms (E] and (Eg in Table O 

In keeping with the suggested properties of secondary events, the invocation 
relation is required to satisfy axioms in Table H Axiom m states that 

invocation is asymmetric. Axiom m requires that both the invoking and invoked 
events occur. Axiom (Q ensures that a secondary event succeeds only if it is 
directly invoked a successful event. Finally, Axiom di defines the transitive 
closure, Inv*, of the invocation relation. Thus a primary event should be thought 
of as inheriting all of the effects of the events that it successfully invokes (either 
directly or indirectly) . 

It is also necessary to represent inertia, or what is not changed by events. 
Intuitively, the inertia atom, Inert{a){t), states that the truth value of fact a 
does not change at time t; that is, that it persists to t + 1. This is stated by the 
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inertia axiom, Axiom (0. Note that the nested equivalence operator makes the 
Inert relation bivalent. An atom changes truth value if it is not inert; Axiom 0). 
The intention is that the inertia axiom should be used to infer persistence of 
facts whenever possible. Given a{t), the inertia assumption, Inert{a){t) , should 
be made whenever it is consistent to do so (given the context of occurrence at 
t), and the axiom used to conclude a{t + 1). 

Definition 9 The theory of events, &e, consists of the axioms in Tabled; thus 
0b = {(IIJ,...,(E)}. An event theory is any theory 0 = 0 e U 0b C TC, where 
the background theory 0b is natural; that is, all of its precondition and effects 
definitions are natural. 

The intended interpretation of event theories is obtained by defining an ap- 
propriate formal pragmatics for them. As noted, the intended interpretations of 
the success and inertia axioms are the positive ones. Given the preconditions of 
an event occurring at time t, its success at t should be assumed whenever possi- 
ble, and the success axiom used to infer its effects at t + 1. Similarly, whenever 
possible it should be assumed that a fact is inert at t and the inertia axiom used 
to infer its persistence to t + 1. The temporal directedness of these interpreta- 
tions suggests that the intended models of causal theories are among those in 
which they are interpreted chronologically. Moreover, in order to generate the 
intended success and inertia assumptions, the context of occurrence at each time 
point should be minimal] that is, it should be restricted to that which required 
by the previous pragmatic interpretation of the theory. These considerations 
suggest that the selected models of an event theory can be defined to be the 
chronologically minimal models of the theory (ij. 

However, a further refinement, prioritization, is necessary in order to estab- 
lish the context of occurrence at each time point and to generate the appropriate 
success and inertia assumptions given it. Thus the selected models of an event 
theory should be those chronologically minimal models of the theory in which, 
at each time point, facts and events are minimized before invocations, invoca- 
tions are minimized before maximizing success assumptions (by minimizing their 
negations), and success assumptions are maximized before maximizing inertia 
assumptions (by minimizing their negations). Minimizing facts, events and invo- 
cations at a time point has the effect of fixing the present context of occurrence 
before speculating about the future. Priority is given to the minimization of 
facts and events, as invocations depend on these. Maximizing success assump- 
tions before maximizing inertia assumptions has the effect that, whenever pos- 
sible, change is preferred to inertia. Thus, whenever possible, conflicts between 
the success axiom and the inertia axiom are resolved in favour of the former. 
An argument for doing so is as follows: maximizing inertia assumptions before 
success assumptions would have the effect that events always failed and nothing 
changed, while maximizing success and inertia assumptions with equal priority 
would result in the effects of events being much less predictable that we expect 
them to be. Finally, in order to keep models as small as possible, TC-formula 
atoms are minimized. 
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Definition 10 Let M and M' be CTC models which differ only on the interpre- 
tation of relations. Then M is E-preferred to M' (written M -<e M' ) iff there 
is a time point t such that M and M' agree for any earlier time point and: 

— at least one more domain atom or occurs atom is defined (is either true or 
false) in M' at t, or 

— M and M' agree on the interpretation of domain and occurs atoms at t, and 
at least one more invocation atom is defined in M' at t, or 

— M and M' agree on the interpretation of domain, occurs and invocation 
atoms at t, and at least one more success atom is false in M' at t, or 

— M and M' agree on the interpretation of domain, occurs, invocation and 
success atoms at t, and at least one more inertia atom is false in M' at t, or 

— M and M' agree on the interpretation of domain, occurs, invocation, success, 
and inertia atoms at t, and at least one more TC -formula atom is defined in 
M' at t. 

A model M is an i^-preferred model of a sentence (f> iff M f and there 
is no other model M' such that M' \= f and M' -<e M . M is an E-preferred 
model of a set of sentences O iff M \= 0 and there is no other model M' such 
that M' 1= 0 and M' -<e M. 

An event theory 0 predicts a sentence (f, written 0 4>, iff 4> is true in all 

E-preferred models of 0. Event theory 0 is pragmatically consistent iff there is 
at least one E-preferred model of 0. 

The pragmatics can be made more concrete by considering model schemas. 
A single CTC-model can be thought of as a schema representing the class of 
all of its classical completions. This idea can be pushed further by adopting a 
canonical interpretation of terms (as in Herbrand models), for then each prag- 
matic interpretation of a theory can be represented by a single model schema. 
Moreover, for theories of the kind considered in this paper, each such schema 
can be represented by the set of facts (domain atoms) and event structures (oc- 
curs and invocation atoms) which are defined in it; as the remaining relations, 
Succ, Inert, Cause, etc., are represented implicitly. For example, if the scenario 
of Example^ below is simplified by removing block B2, then the model schema, 
M, for the resulting theory can be represented as follows: 

M/1 = {Occ{Init){0), At{Bl, LI){1), Occ{Move{Bl, LI, L2)){1), 

Inv{Occ{Move{Bl,Ll,L2), Clear{Ll)){l), Occ{Clear{Ll)){l)} 

M/2 = M/I U {At{Bl, L2){2),^At{Bl, Ll)(2), Clear{Ll)(2)} 

We can then take a dynamic view of the evolving context of occurrence in a 
preferred model schema M. The context of occurrence at time t in M arises 
from the earlier pragmatic interpretation of the theory 0 (for example, M/1 
above represents the context of inference in M at time 1), then the axioms of 
0, especially the success and inertia axioms, are used to extend the context of 
occurrence in M to t -|- 1 (for example, to M/2). This approach is adopted in 
the direct model-building implementation of the theory of primary events Q . 
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Table 3. The theory of causation, 0c 
Ve, t, cj>{PSCause{Occ{e){t), cj>) = {Succ{e){t) A ^{f\ Ob {Eff{e){t + 1) = 0)))) (9) 

Ve,e ,t{CSCause{Occ{e){t), Occ{e'){t)) = 

(T7nn(e, e'){t) A -iT3e'^ (Ini;* (e", e){t) A Inv{e" , e){t)))) (10) 

ye,t,4>,ip,x{SCause{Occ{e){t),4>) = 

{PSCause{Occ{e){t), 4>) V CSCause{Occ{e){t),4>) 

V {SCause{Occ{e){t),'ip) A0{'<P —>■ (f>)) 

V (S'CaMse(Occ(e)(t), ■(/>) A S'C'a«se(Occ(e)(t), x) A D(0 = (?/> A x))))) (H) 
Ve, t, (f>{ Cause{ Occ{e) (t ) , </>) = 

{SCause{Occ{e){t), (f>) A = e A SCause{Occ(e){t), (f>)))) (12) 

Me,t, (p,%p ,x{Causes{Occ{e){t) , (fi) = 

( Cause{ Occ(e) 

V 3e ,t' (Cause{Occ{e)(t), Occ{e){t')) A Causes{Occ(e'){t'), 4>)) 

V (C'aMses(Occ(e)(t), V') A C'awses(Occ(e)(t), x) A □((() = ('i/' A x))))) (13) 

Ve, t{Occ{e){t) = ((e = InitA t = 0) V 3e ,t' SCause{Occ{e){t'), Occ{e){t)))) (14) 

ya,t{Change{a){t) = 

(3eSCause{Occ{e){t) , a{t + 1)) V 3eS'C'aMse(Occ(e)(t), -Ta{t + 1)))) (15) 



4 The Theory of Causation 

The formal definition of causation, which is given in Table 01 is expressed in the 
terms of the theory of events (event occurrences, preconditions, effects, success, 
failure, invocations, facts, inertia, change), the logical notions of consequence 
(semantic entailment in CTC) and equivalence (semantic equivalence in CTC), 
and the pragmatic notion of the context of occurrence. The definition assumes 
the setting of a finite event theory O = U 0b, where 0e is the theory of 
events and 0b is the background theory. 

Axiom 0 states that any event e which succeeds at time t is a prior sufficient 
cause (a PSCause) of its (direct posterior) effects. Thus e is a PSCause of (j) if 
e succeeds at t and any model of the background theory 0 b is also a model of 
the instance Effie){t + 1) = </> of the effects axiom for e 

Axiom states that the occurrence event e is a contemporaneous sufficient 
cause of the occurrence of event e' at t iff e invokes e' at t, and it is not true 
that there is an event e" which (directly or indirectly) invokes e and which 
invokes e' at t. This requirement ensures that e' is causally dependent on e, and 
is illustrated by Example E below. 

More abstractly. Axiom cnD states that e is a sufficient cause (an SCause) 
of effect (j) at t if, at t, e is a prior sufficient cause of (f), or e is a contemporaneous 
sufficient cause of 4>, or e is a sufficient cause of ip which logically entails (f>, or e 
is a sufficient cause of both t/; and x and their conjunction is logically equivalent 
to (j). 

The occurrence of e at t is a direct cause (a Cause) of effect </) iff e is the only 
sufficient cause of </> at t; Axiom (PD. 
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Indirect causation results from causally linked chains of events, each of which 
may terminate in a fact. Accordingly, the indirect-causation relation Causes is 
defined to be the transitive closure of the direct-causation relation Cause, and 
is closed under conjunction of effects; Axiom (El- 

The definition of sufficient causation makes it possible to give elegant state- 
ments of two new laws, which restrict changes and event occurrences to those 
which have sufficient causes. In order to do so, it is assumed that any initial 
conditions of the background theory are covered by a distinguished initial event 
Init, which occurs at time 0. The law of occurrence, Axiom (EH, requires that 
any event occurrence other than Init must have a sufficient cause. The law of 
change. Axiom (EH, requires that any change in the truth value of a fact must 
have a sufficient cause. The impact of these laws is discussed below. 

Definition 11 The theory of causation, 0c, consists of the axioms given in 
Table ^ thus Oq = {(P,--- ,P3)}. If 0E C 0B is a finite event theory, then 
GcCOeCOb is said to he a causal theory. 

Note that the definition of causation is reductive. In any given causal theory, 
all references to causation can be replaced by sentences containing only symbols 
from TC and the logical truth operator. As the interpretation of this operator is 
unaffected by the pragmatics for event theories, the same pragmatics can be used 
to interpret causal theories. The pragmatic interpretation of a causal theory O 
thus depends entirely on that of its constituent event theory and the (reductions 
of) the laws of occurrence and change. In particular, the Causes relation for 0 
is determined by (is supervenient on) those of its event theory, formally echoing 
Hume’s claim that the causal relation has no independent reality. 

Clearly, if the occurrence of event e at time t is the Cause of effect cf), then 
e’s occurrence is contextually sufficient for (f. In view of the laws of occurrence 
and change (axioms (1^ and (E3), e’s occurrence is also contextually necessary 
for (j). In order to see this, consider the case in which e is the PSCause of event 
occurrence or domain atom (f. Now, e can be removed from the context by 
removing all of its sufficient causes. This can be done without introducing new 
effects as the success of each of e’s sufficient causes implies that none of them 
conflicts with any other simultaneous event. As e is the Cause of 4>, it is its 
only SCause (Axiom (Sj). So removing e from the context leaves (j) without 
an SCause. So if (f is an event occurrence (a domain atom), then the law of 
occurrence (the law of change) requires that an additional event occurs at t 
as its SCause. Consequently, as the pragmatics minimizes event occurrences, 
removing e from the context also removes 4> from it. 

Proposition 1 (Properties of the Causes relation) Let 0 b^a pragmatically 
consistent causal theory. Then the sentences listed in Table are true in all 
E -preferred models of 0. 

In conclusion, two examples are given to illustrate the detailed workings of 
the theory. 
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Table 4. Properties of the Causes relation 



Bivalence: 

Transitivity: 

Asymmetry: 

Actuality: 

Consistency: 

Conjunction: 

Consequence: 



y4>,ip{Causes{(f>,'tp) V -iCauses{4>,ijj)) 

y4>,'ip,x{{Causes{4>,4’) A Causes{ip,x)) — > Causes{4>,x)) 
y4>,'ip(Causes{4>,'tp) —* -iCauses{ip, (f>)) 

\/4't'4’{Causes{(p,'tp) {(pAtp)) 
y4>,'ip(Causes{4>,'tp) —* -iCauses{(f>,-'T'>lj)) 

\!4>, Ip, x{{Causes{4>, %p) A Causes{<p, x)) — > Causes{<p, ip A x)) 
Ip, xi{Causes{(p, ip) A □(V' ^ x)) ^ Causes((p, x)) 



Example 1. Consider the following simple blocks- world scenario. At time point 1, 
blocks B1 and B2 are at location LI, and B2 is on B1 (for simplicity, it is 
assumed that being above a location counts as being at it). Also at time 1, the 
event consisting of B1 moving to location L2 occurs. On the basis of this context 
of occurrence, it is expected that the move event will succeed, thereby causing 
B1 to move to L2. Moreover, as B2 is on Bl, it is expected that the movement 
of Bl will cause B2 to move to L2. Finally, the movement of Bl should also 
cause LI to become clear. 

This scenario is represented by axioms |p3)-(23) of TableO. Let — OcC 
0fiU{pi,... ,(E)}, then 

Cause{Occ{Move{Bl, LI, L2)){1), 

Occ{Move{B2,Ll,L2)){l) A Occ{Clear{Ll)){l) A At{Bl,L2){2)) 
A Cau$e{Occ{Move{B2, LI, L2)){1), At{B2, L2){2)) 

A Cause{Occ{Clear{Ll)){l) , Clear{Ll){2)) 

Proof. Let M be an FI-preferred model of Then it follows by chronological 
minimization that Pre{lnit){0) and Occ{Init){0) are the only domain, occurs or 
invocation atoms with temporal index t < 0 which are defined in M. As intended, 
it follows from axioms dH), (uni and m that Succ{lnit){0) is true (in M). So it 
follows from axioms 0 and d) that Init is an SCause of dinj. 

As B2 is on Bl at time 1 (Axiom the occurrence of the event Move{Bl, 
LI, L2) invokes the Move{B2, LI, L2) event (Axiom lITfllL which also occurs (Ax- 
iom ), and has the event Move{Bl, LI, L2) as sufficient cause (axioms d 
and (EH))- Moreover, it follows from the minimization of occurs atoms at time 1 
that Move{Bl,Ll,L2) is the only sufficient cause, and hence is the Cause (Ax- 
iom (EJ)- 

Both move events invoke the Clear{Ll) event at time 1. As the Move{Bl, 
LI, L2) event invokes the Move{B2, LI, L2) event, the first of these move events 
is considered to be a CSCause of the clear event, and the second is not (Ax- 
iom (03)). It follows that it is also an SCause (Axiom d) and the Cause 
(minimization of occurrences. Axiom (EJ)- 

The preconditions of the two Move events are true at time 1 (axioms f UTTl . 
(E^), and it is consistent to assume that the Move{Bl,Ll,L2) event succeeds, 
with the effect At{Bl, L2){2) (axioms Q), (EJ). It follows that the event is a 
PS Cause of this effect (Axiom (3)), an SCause (Axiom d), and its Cause 
(Axiom (OJ, minimization of occurs atoms). 
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Table 5. Axioms for the examples 



At{Bl, Ll)(l) A At{B2, Ll)(l) A On{B2, Bl)(l) A Occ{Move{Bl, LI, L2))(l) (16) 

Va;, I, l' , t{Pre{Move{x, I, l')){t) = At{x, (17) 

Va;, I, l' , t{Eff{Move{x, I, = {At{x, A ^At{x, (18) 

Vx, I, l' , t{{Occ{Move{x, I, A On{y, x){t)) — ► 

Inv{Move{x, I, l'), Move{y, I, l')){t)) (19) 

yi,t{Clear{l){t) = -<3xAt{x,l){t)) (20) 

Vl,t{Pre{Clear{l)){t) = ^Clear{l){t)) (21) 

Vl,t{Eff{Clear{l)){t) = Clear{l){t)) (22) 

Vx, I, l' , t{{Occ{Move{x, I, A At{x, l){t)) Inv{Move{x, I, l'), Clear{l)){i)) (23) 

Pre{lnit){0) A Occ{Init){0) A {Eff{Init){l) = ®Il) (24) 

Pre{lnit){0) A Occ{Init){0) A {Eff{Init){l) = (C3) A At{B3, L1){1))) (25) 



It is also consistent to assume that the Move{B2, LI, L2) event succeeds; in 
particular, Axiom is satisfied by the success of the Move{Bl,Ll,L2) event 
which invoked it. So it follows, as above, that the event succeeds and is the Cause 
oi At{B2,L2){2). 

Similarly the success of the Move{Bl, LI, L2) event at time 1 provides 
grounds for assuming the success of the Clear{Ll) event. It then follows, as 
above, that this is the Cause of Clear{Ll){2) . □ 

In this example it is assumed that locations may contain more than one ob- 
ject and that events may occur simultaneously. So it is typically the case that a 
location only becomes clear as a result of the combined effect of several events 
and the non-occurrence or failure of others. A location’s becoming clear is thus 
typically a global ramification; an indirect effect of several events. But in the 
example the Clear{Ll) event is invoked locally by the movement of a block from 
a location; as only the occurrence of the move event and the location of the 
block are considered when invoking the clear event (Axiom l^il)). This may ap- 
pear reckless. Indeed, if the scenario is extended by adding another block, B3, 
at LI but not On either of LI or L2, then we expect the Clear{Ll) event to fail. 
However, as the pragmatics gives priority to change over inertia (by maximizing 
success assumptions before inertia assumptions), it seems that the Clear{Ll) 
event should succeed in all ill-preferred models of the extended theory, with 
the mysterious side effect of B3 becoming locationless at time 2. However, this 
unintended outcome is prevented by the law of change (Axiom (11511 ). The law 
complements and completes the earlier representation of inertia. The inertia ax- 
iom (Axiom (JZ}) is still needed in order to represent the temporal projection of 
unchanged facts. By restricting changes to those which have a sufficient cause, 
the law of change curbs the effect of the success axiom (Axiom (IIJ ) and strength- 
ens the effect of the inertia axiom, thereby ensuring the proper balance between 
them. In particular, as illustrated by the following example, it forces the failure 
of events whose effects are not caused. Its presence thus means that events can 
be invoked locally and the consequences can be left to take care of themselves. 
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Example 2. Suppose that the background theory of Example Q is extended by 
adding an axiom stating that block B?> is at location LI initially. Then we expect 
that the movement of the other objects will not change the location of B2>. 

Let 0c U Ge U {(EJ) • ■ • ) At(B?>. Ll)(l), (E3)}, then: 

~^Cause{Occ{Move{B\,Ll,L2){l), Clear{Ll){2)) A At{B^,Ll){2). 

Proof. Let M be an E-preferred model of Then, as in Example Q, that 
exactly three events occur at time 1 in M: Move{Bl, LI, L2), Move{B2, LI, L2) 
and Clear{Ll). Moreover, it is consistent to assume that the two move events 
succeed (in M). 

Suppose, for contradiction, that it is consistent to assume that the ClearfLl) 
event also succeeds. Then LI must be clear at time 2 (axioms d), ®). So it 
follows that B3 is no longer at LI (Axiom (1201)), and consequently it follows 
from the change and inertia axioms ((EJ and (1^) that Change{At{B3, Ll)){l) 
is true. Consequently the law of change (Axiom P)) requires that there is 
an SCause for —AAt{B?>,Ll){2). Clearly none of the three events occurring at 
time 1 has this effect. So it follows from the law of change and the bivalence of 
the Change relation that ^ Chang e{Clear{Ll)){l) is true. Consequently it follows 
from the change and inertia axioms that ^Clear{Ll){2) is true. But then it 
follows (axioms (d, (Eo|, (E2|) that ^Succ{Clear{Ll)){l) is true. 

On the other hand it is consistent to assume that At{B, LA) is inert at time 1 
and to conclude, by the inertia axiom, that At[B‘i, Ll)(2) is true. □ 

As can be seen from this example, the law of change governs the success of 
events: while caused change is preferred to inertia, inertia is preferred to uncaused 
change. Its presence thus results in a Yin/Yang interplay between the opposite 
but complementary principles of change and inertia. The principle of change 
appears to be dominant; as change is preferred to inertia and events can be 
invoked whenever they might succeed. However, if the success of an event would 
give rise to changes which it did not cause, then the event fails and inertia 
prevails. 
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Abstract. In "Demonstratives" Kaplan claims that the occurrence of a demon- 
strative must he supplemented hy an act of demonstration, like a pointing (a 
feature of the objective context). Conversely in "Afterthoughts" Kaplan argues 
that the occurrence of a demonstrative must be supplemented by a directing in- 
tention (a feature of the intentional context). I present the two theories in com- 
petition and try to identify the constraints an intention must satisfy in order to 
have semantic relevance. My claim is that the analysis of demonstrative refer- 
ence provides a reliable test for our intuitions on the relation between objective 
and intentional context. I argue that the speaker’s intentions can play a semantic 
role only if they satisfy an Availability Constraint: an intention must be made 
available or communicated to the addressee, and for that purpose the speaker 
can exploit any feature of the objective context. This thesis implies the recon- 
ciliation between "Demonstratives" and "Afterthoughts". 



1 Introduction 

As it is well known, in "Demonstratives" David Kaplan claims that the occurrence of 
a demonstrative must be supplemented by an act of demonstration, like a pointing (a 
feature of the objective context). Conversely in "Afterthoughts" Kaplan argues that 
the occurrence of a demonstrative must be supplemented by a directing intention, the 
referential intention the speaker associate with the expression (a feature of the inten- 
tional context). In this paper, I will present the two theories in competition and try to 
identify the constraints an intention must satisfy in order to have semantic relevance. 
My claim is that the analysis of demonstrative reference provides a reliable test for 
our intuitions on communicative mechanisms, and more specifically on the relation 
between objective and intentional context. In particular, I will argue that the speaker’s 
intentions can play a semantic role only if they satisfy an Availability Constraint: an 
intention must be made available or communicated to the addressee, and for that pur- 
pose the speaker can exploit any feature of the objective context (words, gestures, 
relevance or uniqueness of the referent in the context of utterance). This thesis implies 
the reconciliation between "Demonstratives" and "Afterthoughts". 

The structure of the paper is the following: 

In section 2. 1 present the distinction between indexicals and demonstratives. 

In section 3. 1 analyse Kaplan’s two theories of demonstratives. 
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In section 4. I offer a reconstruction of the objective perspective on context - ac- 
cording to which the reference of a demonstrative is determined by objective facts of 
the utterance context. 

In section 5. I present a reconstruction of the intentional perspective on context - 
according to which the reference of a demonstrative is determined by adding certain 
features of the speaker’s intention. 

In section 6. 1 raise some objections against the intentional perspective on context. 

In section 7. my analysis of demonstrative reference provides a test for our intui- 
tions on communicative mechanisms, and more specifically on the relation between 
objective and intentional context. 

In the conclusion, I argue that the speaker’s intentions can play a role in semantics 
only if they satisfy an Availability Constraint, that is to say if they can be recognised 
by the addressee. 



2 Indexicals and Demonstratives 

Indexicals and demonstratives are referential expressions depending, for their seman- 
tic value, on the context of utterance: they have a reference only given a context of 
utterance. The conventional meaning of an indexical sentence like 

(1) 7 am drunk, 

independently of any context whatsoever, cannot determine the truth conditions of the 
sentence: to evaluate the sentence, the referent of I must be identified. The truth con- 
ditions of an indexical sentence are thus indirectly determined, as a function of the 
context of utterance of the sentence, and in particular as a function of the values of the 
indexicals. According to Kaplan and Perry, a function (or character) is assigned to 
each indexical expression as a type; given a context, the character determines the 
content of the occurrence - which is a function from circumstances of evaluation 
(possible world and time) to semantic values. 

In "Demonstratives", Kaplan introduces the distinction between pure indexicals 
(expressions like I, here, now) and demonstratives (expressions like this, that, she, 
he). As I said, the language conventions associate with a pure indexical as a type a 
rule fixing the reference of the occurrences of the expression in context. The semantic 
value of an indexical (its content, its truth conditional import) is thus determined by a 
conventional rule and by a contextual parameter, which is a publicly available aspect 
of the utterance situation (the objective context). The character of an indexical en- 
codes the specific contextual co-ordinate that is relevant for the determination of its 
semantic value: for I the relevant parameter will be the speaker o[]the utterance, for 
here the place of the utterance, for now the time of the utterance, and so on: the desig- 
nation is then automatic, "given meaning and public contextual facts". ^ 

Conversely, the meaning of a demonstrative, like she in the sentence 

(2) She is drunk, 

by itself doesn't constitute an automatic rule for identifying, given a context, the refer- 
ent of the expression. The semantics of she cannot determine unambiguously its refer- 
ence: if, for instance, in the context of utterance of (2) there is more than one woman, 
the expression she can identify any woman in the same way. 



1 [23], p. 595. 
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3 Demonstration vs. Intention 

According to Kaplan in "Demonstratives", the occurrence of a demonstrative must be 
supplemented by a demonstration, an act of demonstration like a pointing: "typically, 
though not invariably, a (visual) presentation of a local object discriminated by a 
pointing The relevant semantic unit is then the demonstrative associated with a 
demonstration.^ The act of demonstration is semantically relevant in order to com- 
plete the character of a demonstrative. The act of demonstration that could accompany 
a pure indexical is, in turn, either emphatic (as when one utters I pointing to oneself) 
or irrelevant (as when one utters I pointing to someone else: in this case, the referent 
of I remains the speaker): onee the context of utterance is fixed, the linguistic rules 
governing the use of the indexicals determine completely, automatically and unambi- 
guously their reference, no matter what the speaker’s intentions are. 

However, according to Kaplan, a demonstration does not always require an action 
on the speaker’s part, as when we shout 

(3) Stop that man 

if there is only one man, or only one man rushing toward the door, or only one man 
running completely naked. Or there may be a convention identifying the demonstra- 
tum with any object appearing on a "demonstration platform"; or else^the speaker may 
exploit a natural demonstration, as an explosion or a shooting star|j In this way, the 
speaker may exploit a gesture, or the uniqueness of the demonstratum in the context 
of utteranee, or its saliency, or its relevance. Likewise, we can interpret in terms of 
uniqueness or relevance of the demonstratum the cases of non visual perceptual de- 
monstratives, as in 

That noise is driving me crazyla 

This smell is delicious; 

This flavour reminds me of something. 

All the examples, in fact, are appropriate only if there is only one noise (or smell or 
flavour), or only one relevant noise in the context of utterance. 

In "Afterthoughts", Kaplan modifies his own theory. He now acknowledges that 
even a gesture associated with an occurrence of a demonstrative, constituting the aet 
of demonstration, may be insufficient to disambiguate the expression. Just imagine 
the sentence 

(4) I like that 

uttered by someone pointing clearly and unambiguously to a dog: the expression that 
could designate the dog, or his coat, or a button of the coat, or the colour of the coat 
or, for that matter, any spatial region or molecule between the speaker’s finger and the 
dog. The gesture then does not have a semantic role anymore; for Kaplan the relevant 
factor is now "the speaker’s directing intention". The demonstration has only the role 
of manifesting the intention, of externalising it - a role of pragmatic aid to communi- 
cation: "I am now inclined... to regard the demonstration as a mere externalization of 
this inner intention. The externalization is an aid to communication, like speaking 



2 [16], p. 490. 

^ Cf. [16], p. 492: "The referent of a pure indexical depends on context, and the referent of a 
demonstrative depends on the associated demonstration". 

^ Cf. [16], p. 525f 
5 Cf. [25], p. 200f. 
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more slowly and loudly, but is of no semantic significance". ^Every occurrence of the 
same demonstrative as a type has to be associated not with an act of demonstration 
but with an intention|| In this sense, a demonstrative is different from an indexical: 
once the context of utterance is fixed, the linguistic rules governing the use of the in- 
dexicals determine completely, automatically and unambiguously their reference, no 
matter what the speaker’s intentions are|] 

Kaplan doesn’t offer an explicit and fully satisfactory explanation of why he now 
favours IPC, and thinks demonstrations are not semantically significant. The argu- 
ments are made explicit by Marga Reimer and Kent Bach in a group of articles pub- 
lished at the beginning of the 90’s in Analysis and Philosophical Studies. In what fol- 
lows, I will reconstruct the two competing theories: 

• the objective perspective on context (OPC): according to Kaplan 1977, the refer- 
ence of a demonstrative is determined by objective facts of the context of utter- 
ance. 

• the intentional perspective on context (IPC): according to Kaplan 1989, the refer- 
ence of a demonstrative is determined by completing the character of the demon- 
strative with features of the speaker’s intention. 

We will see that, according to Bach, Reimer doesn’t offer a fair reconstruction of IPC. 
In her reconstruction, the intentional perspective is reduced to a sort of Humpty 
Dumpty theory of language, according to which the speaker has a proposition in 
mind, and hopes that the addressee is a mind reader. I will first try to offer a better re- 
construction of IPC and then try to identify the constraints an intention must satisfy in 
order to have semantic relevance. 



4 Reimer and OPC 

It is usual to distinguish between: 

• the context in terms of intentional states of the participants, or shared assumption^ 
- what we can call the subjective context, or the cognitive context, or the inten- 
tional context; 

• the context in terms of relevant states of affairs occurring in the world - the objec- 
tive context.!^ 

As I said, the reference of a demonstrative doesn’t appear to be bound by semantic 
rules in the way the reference of an indexical seems to be: the semantic rule by itself 
doesn’t determine the reference of the demonstrative expression in the light of the 
context of utterance. The question to be answered is: what do we have to add to se- 
mantic rules and context of utterance in order to have a complete proposition: 



« [18], p. 582. 

’ [18], p. 588: "The directing intention is the element that differentiates the ’meaning’ of one 
syntactic occurrence of a demonstrative from another, creating the potential for distinct ref- 
erents, and creating the actuality of equivocation". 

* Cf. [7]. For a different perspective on the pure indexicals/demonstratives distinction, see [9]. 

^ Assumptions actually shared, as in [10], or only supposedly shared, as in [30]. 

On the distinction between cognitive and objective context cf. [14], [21], [22], and [29]. 
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- something like a demonstration - that is a feature of the objective context (OPC), 

or rather 

- something like an intention - that is a feature of the intentional context (IPC)? 

To answer this question, let's examine some of Reimer's examples. In all cases, the 
reference of the demonstrative seems to be individuated by the speaker's gesture, or 
else by an element of the context in the objective sense, by public contextual facts. 

Case I. "Cases in which the demonstrated object is clearly not the object toward 
which the speaker has a 'directing intention'"[^ Suppose John grabs a bunch of keys 
on the desk, saying: 

(5) These are mine. 

He intends to refer to his own keys, but mistakenly grabs his officemate's keys. Intui- 
tively, in this case, the reference is individuated by an objective aspect of the utter- 
ance situation, that is John's ostensive gesture. The keys on the desk belong to his of- 
ficemate, hence (5) is false. 

Case II. "Cases in which the demonstrated object is neither perceived by the 
speaker, nor the object the speaker 'has in mind'". Qa classic example is provided by 
Kaplan in "Dthat". John points, without turning and looking, to the place on the wall 
which was occupied by a picture of Carnap and utters: 

(6) That is a picture of one of the greatest philosophers of the twentieth century. 
But, unbeknownst to him, the picture has been replaced by Spiro Agnew's portrait. 
Even if John intends to refer to Carnap's picture - or, as Kaplan writes, "has in mind" 
Carnap's picture^- he in fact refers to Agnew's picture: (6) cannot be taken as true. 

Case III. "Cases in which there appears to be neither a demonstration nor a de- 
monstratum, despite the presence of a 'directing intention'"^ Suppose that John and 
Mary are in the park, observing several dogs (all equally salient) playing and running 
together. John intends to point and refer to his dog Fido, and utters 

(7) That dog is Fido 

but sudden paralysis prevents him from pointing or making any ostensive gesture, like 
nodding or glancing. According to Reimer, a supporter of IPC is committed to say 
that, if it is the speaker's intention that rules, then the reference of that dog is the dog 
John "has in mind". However, our intuitions are different. Since no dog was being 
demonstrated, no dog was referred to: like the description the black dog is empty if 
there is no black dog, the demonstrative description that dog is empty if no dog is 
demonstrated, and (7) doesn't express any proposition. 

Case IV. If there is no demonstration, salience gets semantic significance in order 
to complete the character of the demonstrative. As in case III. John and Mary are in 
the park, observing several dogs playing and running together. John intends to point 
and refer to his dog Fido, and utters (7), but sudden paralysis prevents him from 
pointing or making any ostensive gesture. But suppose that Spot has made himself es- 
pecially salient by his hysterical barking. In this case, intuitively, the reference of that 
dog seems individuated by salience. Mary is justified in taking John as referring to the 
most salient dog in the context of utterance, no matter what John's intentions are. The 



“ [25],p.l89. 
‘2 [25], p. 190. 
‘2 [17], p. 396. 
[25],p.l90. 
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most salient dog in the context of utterance is Spot: (7) succeeds in expressing a 
proposition, but a false one. 

Case V. However, the ostensive gesture generally overrides salience. As in case 
III. John and Mary are in the park, observing several dogs playing and running to- 
gether. Suppose that Spot has made himself especially salient by his hysterical bark- 
ing. John intends refer to his dog Fido, and, pointing directly to Fido, utters (7). Intui- 
tively, in this case it is the gesture that has semantic significance and discriminates the 
referent from the other candidates: even if another dog, Spot, was more salient in the 
context of utterance, that dog refers to Fido and (7) is true. 

Case VI. The ostensive gesture overrides the speaker’s intentions. As in case III. 
John and Mary are in the park, observing several dogs playing and running together. 
John intends to point and refer to his dog Fido, and utters (7), but a nervous tic makes 
his arm move in the direction of another dog. Spot. Following the intentional per- 
spective, one should say that if it is the speaker’s intention that rules, then the refer- 
ence is the dog John has in mind. But, intuitively, the reference seems individuated by 
John's gesture - even if unintentional - and his intentions seem irrelevant: (7) ex- 
presses a false proposition. 

It seems, then, that in all the cases under examination, the speaker's intention does- 
n't play any essential role, that is any semantic role in determining the reference of the 
demonstrative - which is fixed (when it is fixed) by the objective context. 



5 Bach and IPC 

The main point of Bach's defence of IPC is to show that a communicative intention 
requires more than just 'having in mind'. According to Bach's theory of referential in- 
tentions "a referential intention is part of a communicative intention, an intention 
whose distinctive feature is that 'its fulfilment consists in its recognition'... A referen- 
tial intention. . . involves intending one's audience to identify something as the referent 
by means of thinking of it in a certain identifiable way"J|^ 

Let's start with Kaplan's classic example (Case II). In Bach's reconstruction, two 
intentions must be attributed to the speaker: 

a. the intention to refer to Carnap's portrait; 

b. the intention to refer to the portrait on the wall behind him. 

Although John intended to refer to Carnap's portrait, he didn't intend his addressee to 
recognise that intention (a.); the intention he intended the addressee to recognise was 
that referring to the portrait on the wall behind him (b.). The referential intention is 
this last one: "the one which you intend and expect your audience to recognize and 
rely on in order to identify a certain [picture] as the referent" E 

The analysis of Kaplan's example is easily extended to Case I (John's keys). Al- 
though John intends to refer to his own keys, he doesn't intend Mary to recognise this 
intention; the intention he intends Mary to recognise is that referring to the keys he 
grabbed. The intention semantically relevant is this last one. Even if John intends to 



[3], p. 296. On referential intentions, see also [5] and [6]. As it is well known, Bach’s theory 
is a development of Grice’s, and of his intention-based and inferential view of communica- 
tion. 

16 [4], p. 143. 
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refer to his own keys, he in fact refers to the keys he grabbed - which happen to be- 
long to Mary. John's words express the proposition that the keys he grabbed are his: 
since they belong to Mary, (5) is false. 

Let's now see Case III (the paralysis). Although John intends to refer to his own 
dog, he doesn't intend Mary to recognise this intention; the intention he intends Mary 
to recognise is that referring to the dog he is pointing at. But of course he has not done 
what it is necessary to enable Mary to recognise this very intention: so, Bach argues, 
the relevant intention is empty: "[IPC] does not say that such an intention can be ful- 
filled even if no act of demonstration is performed when, as in the example, the ful- 
filment of this intention requires such an act. After all, the intention in this case is to 
refer to what is being pointed at"0 

Case IV (salience). Although John intends to refer to his own dog, he doesn't in- 
tend Mary to recognise this intention; the intention he intends Mary to recognise is 
that referring to the relevant dog in the context of utterance. The intention semanti- 
cally relevant is this last one: there is no act of pointing, no explosion or falling star, 
in other words there is no further evidence - except relevance - permitting Mary to 
identify John's communicative intention. John's words express the proposition that the 
relevant dog in the context of utterance is his: since the relevant dog is the dog bark- 
ing hysterically, and since Spot, and not Fido, is barking hysterically, (7) is false. 

The same goes for Case V (the gesture overriding salience). Although John intends 
to refer to his own dog, he doesn't intend Mary to recognise this intention; the inten- 
tion he intends Mary to recognise is that referring to the dog he is pointing at. IPC 
agrees here with OPC. 

Case VI (John's tic). Although John intends to refer to his own dog, he doesn't in- 
tend Mary to recognise this intention; the intention he intends Mary to recognise is 
that referring to the dog he is pointing at. The intention semantically relevant is this 
last one, for the act of pointing (even if unintentional - but, and this is crucial, not rec- 
ognised as such) is the only evidence permitting Mary to identify John's communica- 
tive intention. John's words express the proposition that the dog he is pointing at 
(Spot) is his: (7) is false. 

Let's sum up. Suppose that the speaker utters the expression that dog: if the dog he 
intends to refer to is the only dog in the context of utterance, or the most salient dog, 
the demonstrative expression doesn't require any other action on the speaker's part. In 
all the other cases, if there are several dogs all equally salient, the speaker must com- 
plete the character of the demonstrative expression with an act of demonstration, like 
pointing, glancing, or nodding. The speaker has then the referential intention to refer 
to the dog he is pointing at: notice that pointing is only a way of making an object sa- 
lient, and has no semantic significance, but only a pragmatic one - like speaking more 
slowly and loudly. 



6 Some Objections against IPC 

I agree with Bach analysis, and with his distinction between two kinds of intentions in 
a referential act: background intentions (as the intention of referring to Fido, or to 
Carnap's picture) and fundamental intentions (as the intention of referring to the dog 
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the speaker is pointing at, or to the portrait on the wall behind him). Yet, in my opin- 
ion, even if interpreted in this way, IPC may still raise some objections. Let’s see 
some of them. 

Case VII. Suppose that John and Mary are in the park, observing several dogs (all 
equally salient) playing and running together. John has the intention of showing Mary 
his dog Fido; to help her discriminate his dog among all the other dogs, he tells her 
that Fido has a bad limp. Then, pointing at Fido, he utters: 

(7) That dog is Fido. 

The reference of the expression that dog if Fido, hence (7) is true. 

Case VIII. Like case VII, with the following exceptions: Fido clearly has no limp, 
but another dog, Spot, clearly has. Though Fido is in the most direct line with John’s 
finger, John could possibly be taken as pointing, perhaps not too precisely, at Spot. 
Limping is the most relevant contextual information for discriminating the referent; in 
case VIII the reference of the expression that dog if Spot, hence (7) is false. 

Case IX. Like case VII, with the following exceptions: John has been telling Mary 
many distinctive features Fido has: he has a bad limp, is huge, ferocious-looking, has 
a black leather collar with studs, and looks like a pit bull. All these things are true of 
Fido, except for the limp, and no other dog in the park is remotely like that, especially 
Spot, who has a bad limp, but is small, frail, with a red collar, and looks like a French 
poodle. In this scenario, Mary has enough independent contextual information to dis- 
criminate the reference of that dog: the reference is Fido, hence (7) is true. 

It seems that the speaker’s intentions are neither necessary nor sufficient to fix the 
reference of a demonstrative. In case VIII, the reference (Spot) is fixed despite John’s 
intentions - which have Fido as object. In case IX, the reference (Fido) is fixed inde- 
pendently of John’s intentions: even if John associates no intention with his use of the 
demonstrative, the reference would be discriminated by the information previously 
given. Not any intention, then, is a good candidate to fix the reference of a demon- 
strative. Let’s examine one last case. 

Case X. Like case VII, with the following exception: Spot has made himself espe- 
cially salient by his hysterical barking. Suppose that John utters (7) with the intention 
of referring to Fido - a dog non-salient John is not pointing at. In this context, John’s 
intention of referring to Fido, using no gesture, nodding, nor glancing, would be bi- 
zarre, i. e. unconnected with a context or a behaviour that would enable Mary to dis- 
criminate the intended dog. 



7 Good Intentions 



IPC, as I interpret it, requires communicative intentions to be non-arbitrary - that is 
connected with a behaviour that will enable the addressee to identify the referent.[^In 
other words, an intention, to be semantically relevant, must satisfy what I propose to 
call an Availability Constraint, that is it must be communicated or made available to 



** On this point, see [28], p. 198: Roberts speaks of "reasonable referential intentions", basing 
his argument on Donnellan’s treatment of reasonable expectations and intentions: "On Don- 
nellan's view... one's intentions are limited by reasonable expectations, which in turn are 
limited by established practices and particular stipulations" (p. 196); cf. [1 1], pp. 212-214. 
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the addressee^ Mary can’t recognise any intention John could have: she can’t read 
John’s mind. In case X, the only manifest basis for Mary to identify John’s communi- 
cative intention is the presence, in the context of utterance, of a dog having made 
himself especially salient (for instance by his hysterical barking). 

Let me state my point once again, in a slightly different way [^According to Re- 
imer there are only two plausible accounts of the proposition John’s words express in 
case X: 

- a) Spot belongs to John; 

- b) Fido belongs to John. 

Following Bach’s theory of communicative intentions, we should say that in case X 
the proposition John’s words express is: 

- c) the relevant dog belongs to John, 
or 

- o’) the dog John succeeded in calling Mary’s attention to belongs to John. 

Since the relevant dog is the one barking hysterically. Spot, and since Spot doesn’t 
belong to John, (7) is false. 

Likewise in Kaplan’s classic example 

(6) That is a picture of one of the greatest philosophers of the twentieth century, 
there are three accounts of the proposition expressed by John’s words: 

- a) the picture of Agnew is a picture of one of the greatest philosophers of the 
twentieth century; 

- b) the picture of Carnap is a picture of one of the greatest philosophers of the 
twentieth century; 

- c) the picture on the wall behind him is a picture of one of the greatest philosophers 
of the twentieth century. 

c) is the proposition expressed by (6): since the picture on the wall behind John is 
Agnew’s portrait, (6) cannot be taken as true. The proposition c) can account both for 
what John’s words express and for what John wants to convey, b) is the proposition 
that John expects Mary to infer on the basis of the proposition c) - which is the 
proposition his words express: c) satisfies the Availability Constraint, but b) doesn't. 

Not any intention satisfies the Availability Constraint, just the "good" ones. A 
"good" communicative intention is something an addressee, in normal circumstances, 
is able to work out using 

1. external facts, 

2. linguistic co-text, 

3. background knowledge. 

Of course, those three kinds of contextual information are nothing more than a way of 
spelling out relevance]^ 



But, in my opinion, not to any competent speaker, as Garcia Carpintero proposes; cf. [12], p. 
537: "I will take demonstrations to be sets of deictical intentions manifested in features of the 
context of utterance available as such to any competent user". On this point, see [8], chapter 
X; Marina Sbisa suggests to extend this availability constraint to all the "relevant partici- 
pants" (personal communication). 

I am indebted to Chris Gauker for helping me reformulating my argument in the following 
way. 

I am well aware that relevance needs a definition far more accurate than the one given in this 
paper: for a more detailed analysis, see [8]. 
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1 . First, we have the information inferred from the extralinguistic or physical con- 
text - available to both speaker and addressee. As we said, the demonstrative expres- 
sion that dog doesn’t require any action on the speaker’s part if the dog he intends to 
refer to is the only dog in the context of utterance, or the only dog among cats and 
birds, or the most salient dog (for "external" reasons, as, for example, his behaviour) 
in the context of utterance. 

2. Second, we have the information inferred from the linguistic co-text. Suppose 
that, during the conversation in the park, John and Mary mention Spot; in this case a 
demonstrative (non anaphorical) use of 

(8) That dog costs a fortune 

will refer quite naturally to Spot. Notice that it is possible to build more sophisticated 
examples, referring not only to objects explicitly mentioned in the previous conversa- 
tion, but only presupposed. In the same situation, if John utters 

(9) That collar costs a fortune 

the demonstrative expression that collar will refer to Spot’s collar, even if no collar 
was already mentioned in the conversation. 

3. Third, we have the information inferred from the knowledge shared by speaker 
and addressee, because they belong to the same community, or to the same sub- 
community. Just think to the vertiginous amount of information two friends share, and 
may take as basis for the recognition of their interlocutor’s communicative intentions. 
Suppose that John loves big, ferocious dogs, and Mary knows it. They are in a park 
observing several dogs all equally salient (for external reasons), and John utters 

(10) That dog is mine'. 

Mary will easily determine the reference of that dog if there are dozens of French 
poodles but only one Rottweiler. 



8 Conclusion 

In my paper, I have presented two competing perspectives on the problem of the de- 
termination of the demonstrative reference - OPC and IPC - and I have tried to offer a 
fair reconstruction of IPC. According to Kaplan 1989, the addressee must take into 
account the speaker’s intentions to identify the reference of the demonstratives. In my 
paper, the analysis of demonstrative reference has provided a reliable test for our in- 
tuitions on communicative mechanisms, and more specifically on the relation between 
objective and intentional context. Therefore, this analysis has been the starting point 
for a more general reflection on the notion of communicative intention. Examples 
have been provided to argue that the speaker’s communicative intentions can play a 
semantic role only if they satisfy an Availability Constraint, that is to say if they are 
reasonable and not arbitrary, and can be recognised by the addressee: reference is de- 
termined by public behaviour, by intentional acts and not by intentions as mental ob- 
jects^ In other words, to be semantically relevant, an intention must be made avail- 
able or communicated to the addressee, and for that purpose the speaker can exploit 
any feature of the objective context - words, gestures, relevance or uniqueness of the 
referent in the context of utterance: elements of the intentional context can be identi- 



22 Cf. [28], p. 199. 
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fied only through the identification of elements of the ohjective context]^ This thesis 
implies the reconciliation between "Demonstratives" - in which Kaplan claims that 
the occurrence of a demonstrative must be supplemented by a demonstration, like a 
pointing (a feature of the objective context) - and "Afterthoughts" - in which, con- 
versely, Kaplan argues that the occurrence of a demonstrative must be supplemented 
by a directing intention, the referential intention the speaker associate with the ex- 
pression (a feature of the intentional context).^ 
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Abstract. The development of more and more complex distributed 
applications over large networks of computers has raised the problem 
of semantic interoperability across applications based on local and au- 
tonomous semantic schemas (e.g., concept hierarchies, taxonomies, on- 
tologies). In this paper we propose to view each semantic schema as 
a context (in the sense dehned in D), and propose an algorithm for 
automatically discovering relations across contexts (where relations are 
defined in the sense of 0). The main feature of the algorithm is that 
the problem of Hnding relationships between contexts is encoded as a 
problem of logical satisfiability, and so the discovered mappings have a 
well-defined semantic. The algorithm we describe has been implemented 
as part of a peer-to-peer system for Distributed Knowledge Management, 
and tested on significant cases. 



1 Introduction 

The development of more and more complex distributed applications over large 
networks of computers has created a whole new class of conceptual, technical, 
and organizational problems. Among them, one of the most challenging one is 
the problem of semantic interoperability, namely the problem of allowing the 
exchange of meaningful information/knowledge across applications which (i) use 
autonomously developed conceptualizations of their domain, and (ii) need to 
collaborate to achieve their users’ goals. 

Essentially, there are two main approaches for solving the problem of seman- 
tic interoperability. The first is based on the availability of shared semantic struc- 
tures (e.g., ontologies, global schemas) onto which local representations can be 
totally or partially mapped. The second is based on the creation of a global rep- 
resentation which integrates local representations. Both approaches do not seem 
suitable in scenarios where: (i) local representations are updated and changed 
very frequently, (ii) each local representation is managed in full autonomy w.r.t. 
the other ones, (iii) local representations may appear and disappear at any time, 
(iv) the discovery of semantic relation across different representations can be 
driven by a user’s query, and thus cannot be computed beforehand (runtime 
discovery) nor take advantage of human intervention (automatic discovery). 
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In this paper we propose an approach in which local schemas are viewed as 
contexts, namely as partial and approximate representations of the world from 
an individual’s or a group’s perspective (two simple examples of schemas are 
the two directory structures from Google and Yahoo in Figure^). This approach, 
which is motivated by the work on Distributed Knowledge Management (DKM) 
[ ft.Bj . is based on the assumption that a successful knowledge-based application 
should not “force” people to change their way of looking at things (encoded, for 
example, in a database schema or in the classification of a document management 
system), as the imposed schema would be perceived “either as oppressive or 
irrelevant” pj. Thus, from our perspective, local schemas play the role of a lens 
through which people look at the world and make sense of it. In a word, a schema 
is the context in which facts are taken as true, decisions are made, objects are 
classified, relations among things are asserted and understood. 

The problem of such a vision is that communication across different local 
schemas (contexts) becomes difficult. The algorithm we present in this paper is 
precisely a first solution to the problem of runtime and automatic discovery of 
semantic relations across autonomous contexts. More specifically, we start from 
a broad family of schemas (called concept hierarchies) , and present a method for 
discovering the type of relation existing between two nodes (each representing 
a concept) belonging to different schemas. The main feature of the algorithm is 
that the problem of finding relations between concepts in different contexts is 
encoded as a problem of logical satisfiability of a set of formulae. This allows 
us to assign a precise semantic to each discovered mapping. In particular, we 
claim that the correct semantic for a mapping between concepts of different 
contexts is in terms of a compatibility relation (as defined in 0), namely as a 
constraint on the local interpretations of the two contexts that are compatible 
with each others. In this sense, the algorithm we present is a first attempt to 
discover (rather than assume) relations over local models of two or more contexts 
(which, from a proof-theoretical point of view, corresponds to discover “bridge 
rules” jHj across contexts). 

The paper goes as follows. First, we characterize the scenarios that motivate 
our approach, and explain why we use the theory of context as a theoretical 
background of the algorithm. Then, we describe the macro-blocks of the algo- 
rithm, namely semantic explicitation and context mapping via SAT. Finally, we 
describe the results of our preliminary tests and briefiy compare our algorithm 
with some other proposals in the literature. 

2 Motivating Scenarios 

The work on the algorithm was originally motivated by a research on Distributed 
Knowledge Management , namely a distributed approach to managing corpo- 
rate knowledge in which users (or groups of users, e.g. communities) are allowed 
to organize their knowledge using autonomously developed schemas (e.g., di- 
rectories, taxonomies, corporate ontologies), and are then supported in finding 
relevant knowledge in other local schemas available in the corporate network. 
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www.google.com www.yahoo.com 




Fig. 1. Examples of concept hierarchies (source: Google and Yahoo) 



In this scenario, the algorithm we present aims at solving the following prob- 
lem. Let s (the source schema) and t (the target schema) be two autonomous 
schemas that different users (or groups) use to organize and access a local body 
of data. Given a concept kg in s, and a concept kt in t, what is the semantic 
relations between kg and fct? For example, are the two concepts equivalent? Or 
one is more (less) general than the other one? In addressing this problem, it is 
assumed that the basic elements of each schema are described using words and 
phrases from natural language (e.g., English, Italian); this reflects the intuition 
that schemas encode a lot of implicit knowledge, which can be made explicit 
only if one has access to the meaning of the words that people use to denote 
concepts in the schema. 

Scenarios with similar features can be found in other important application 
domains, such as the semantic web (where each site can have a semantic de- 
scription of its contents and services), marketplaces (where every participating 
company may have a different catalog, and every marketplace may adopt a dif- 
ferent standard for cataloging products); search engines (some of them , e.g. 
Google and Yahoo, provide heterogeneous classifications of web pages in web 
directories); the file system on the PGs of different users (where each user stores 
documents in different directory structures). So the class of applications in which 
our algorithm can be applied is quite broad. 

3 Local Schemas as Contexts 

In many interesting applications, schemas are directed graphs, whose nodes and 
edges are labeled with terms or phrases from natural language. A typical example 
is depicted in Figure Q whose structures are taken from the Google and Yahoo 
directories. In this section, we briefly argue why we interpret these schemas as 
contexts in the sense of 0 (see Q for a formalization) . 

In schemas like the ones in the figure, the meaning of a label depends not 
only on its linguistic meaning (what a dictionary or thesaurus would say about 
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that word or phrase), but also on the context in which it occurs: first, it depends 
on the position in the schema (e.g., the documents we as humans expect to 
find under the concept labeled Baroque in the two structures in Figure Q are 
quite different, even if the label is the same, and is used in the same linguistic 
sense); second, it depends on background knowledge about the schema itself 
(e.g., that there are chat and forums about literature helps in understanding the 
implicit relation between these two concepts in the left hand side schema) . These 
contextual aspects of meaning are distinct (though related) to purely linguistic 
meaning, and we want to take them into account in our algorithm. 

For this purpose, the algorithm we present in this paper is applied to contexts 
rather than to schemas directly. In [P, a context is viewed as a box, whose content 
is an explicit (partial, approximate) representation of some domain, and whose 
boundaries are defined by a collection of assumptions which hold about the 
explicit representation. The notion of context we use in this paper is an special 
case of the notion above. A context is defined as a pair c = {Rc,Ac), where: 

1. i?c is a graph, whose nodes and edges can be labeled with expressions from 
natural language; 

2. Ac is a collection of explicit assumptions, namely attributes (parame- 
ter/value pairs) that provide meta-information about the content of the 
context. 

In the current version of the algorithm, we restrict ourselves to the case in 
which Rc is a concept hierarchy (see Def. P, and the explicit assumptions Ac 
are only three: the id of the natural language in which labels are expressed (e.g., 
English, Italian), the reference structure Rc of the explicit representation (the 
only accepted value, at the moment, is “concept hierarchy”, but in general other 
values will be allowed, e.g., taxonomy, ontology, semantic network, frame), and 
the domain theory (see below for an explanation of this parameter). Their role 
will become apparent in the description of the algorithm. 

A concept hierarchy is defined as follows: 

Definition 1 (Concept hierarchy). A concept hierarchy is a triple H = 
{K, E, 1) where K is a finite set of nodes, E is a set of arcs on K, such that 
{K, E) is a rooted tree, and I is a function from K U E to a set L of strings. 



Definition 2 (Hierarchical classification). A hierarchical classification of a 
set of documents D in a concept hierarchy H = {K, E, 1) is a function p, ■. K ^ 
2 ^. 



p satisfies the following specificity principle: a user classifies a document d 
under a concept k, if d is about k (according to the peer) and there isn’t a more 
specific concept k' under which d could be classified . 

^ See Yahoo instruction for “Finding an appropriate Category” at 
http: / /docs. yahoo.com/info/suggest/appropriate.html. 
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Mappings between contexts are defined as follows: 

Definition 3 (Mapping fnnction). A mapping function M from H = 
{K,EJ) to H' = is a function M ■. K x K' rel, where rel is 

a set of symbols, called the possible mappings. 

The set rel of possible mappings we consider in this paper contains the 
following: kg kt, for kg is more general than kt\ kg kt for kg is less general 

^ I 

than kt, kg — > kt for kg is compatible with kt, kg — > kt for kg is disjoint from 
kf, kg kt for kg is equivalent to kt- The formal semantics of these expressions 
is given in terms of compatibility between document classifications of Hg and 
Ht: 



Definition 4. A mapping function M from Hg to Ht is extensionally correct 
with respect to two hierarchical classifications p,g and p,t of the same set of doc- 
uments D in Hg and Ht, respectively , if the following conditions hold for any 
kg G Kg and kt G Kt: 



kg kt f^s{hgi) ^ f^ti^ti) 
hg ^ kt ^ P’s{kg\^') G p,t(^kt\.) 

kg ^ kt p^s{,kg\f) n pjt{,kt\f} “ 0 

kg > kt ^ P’s{kg\,'j = p,t{kt\.) 

kg ^ kt n f^t(,kt\,') 0 

where p.{ci) is the union of p.{d) for any d in the subtree rooted at c. 

The semantics introduced in Definition 0 can be viewed as an instance of the 
compatibility relation between contexts as defined in Local Models Semantics P, 
P. Indeed, suppose we take a set of documents D as the domain of interpreta- 
tion of the local models of two contexts Ci and C2, and each concept as a unary 
predicate. If we see the documents associated to a concept as the interpretation 
of a predicate in a local model, then the relation we discover between concepts of 
different contexts can be viewed as a compatibility constraint between the local 
models of the two concepts. For example, if the algorithm returns an equivalence 
between the concepts ki and k2 in the contexts ci and C2, then it can be inter- 
preted as the following constraint: if a local model of Ci associates a document d 
to ki, then any compatible model of C2 must associate d to k2 (and vice versa); 
analogously for the other relations. 



4 The Matching Algorithm 

The algorithm has two main phases: 

Semantic explicitation. In the schema level, a lot of information is implicit 
in the labels, and in the structure. The objective of this first phase is to 
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make it as explicit as possible by associating to each node (and edge) k a 
logical formula w{k) that encodes this information. Intuitively, w{k) is an 
approximation of the human interpretation. 

Semantic comparison. We encode the problem of finding mappings between 
two concepts k and k' , whose explicit meaning is w{k) and w{k'), into a 
problem of satisfiability, which is then solved by a SAT solver in a logic W 
(i.e., the logic in which w{c) and re(c') are expressed). Domain knowledge is 
also encoded as a set of formulas of W. 

Since here we are mainly focussed on the second phase, we only provide a 
short description of semantic explicitation (details can be found in and 

then move to the SAT encoding. 



4.1 Semantic Explicitation 

The goal of the first phase is to make explicit all the semantic information 
which can be fruitfully used to define the SAT problem in a rich way. The main 
intuition is that any schema is interpreted (by its users) using two main sources of 
information: lexical information, which tells us that a word (or a phrase) can have 
multiple senses, synonyms, and so on; and a background theory, which provides 
extra-linguistic information about the concepts in the schema, and about their 
relations. For example, lexical information about the word “Arizona” tells us 
that it can mean “a state in southwestern United States” or a “glossy snake”. 
The fact that snakes are animals (reptiles), that snakes are poisonous, and so 
can be very dangerous, and so on, are part of a background theory which one 
has in mind when using the word “Arizona” to mean a snak|^. In the version of 
the algorithm we present here, we use WordNet as a source both of lexical and 
background information about the labels in the schema. However, we’d like to 
stress the fact that the algorithm does not depend on the choice of any particular 
dictionary or theory (i.e., does not depend on WordNet). Moreover, we do not 
assume that the same dictionary and background theory are used to explicit the 
semantic of the two contexts to be matched. 

Semantic explicitation is made in two main steps: linguistic interpretation 
and contextualization. 



Linguistic interpretation. Let H = {K, E, 1) be a concept hierarchy and 
the set of labels associated to the nodes and edges of a hierarchy H by the 
function 1. In this phase we associate to each label s G Lh a logical formula 
representing the interpretation of that label w.r.t. the background theory we 
use. 

^ We are not saying here that there is only one background theory. On the contrary, 
theories tend to differ a lot from individual to individual, and this is part of the 
reason why communication can fail. What we are saying is that, to understand what 
“Arizona” means in a schema (such as the concept hierarchy in the left hand side of 
Figure P, one must have a theory in mind. 
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Definition 5 (Label interpretation). Given a logic W, a label interpretation 
in W is a function I : Lh wff{W), where wff{W) is the set of well formed 
formulas ofW. 

The choice of W depends on the external assumptions of the context con- 
taining H. For concept hierarchies, we adopted a description logic W with U, n 
and - 1 , whose primitive concepts are the synsets of WordNet that we associate 
to each label (with a suitable interpretation of conjunctions, disjunctions, multi- 
words, punctuation, and parenthesis). For example, WordNet provides 2 senses 
for the label Arizona in Figure P denoted by #1 and #2; in this case, the output 
of the linguistic analysis is the following formula in W\ Arizona#! U Arizona#2 



Contextualization. Linguistic analysis of labels is definitely not enough. The 
phase of contextualization aims at pruning or enriching the synsets associated to 
a label in the previous phase by using the context in which this label occurs. In 
particular, we introduce the concept of focus of a concept k, namely the smallest 
subset of H which we need to consider to determine the meaning of k. What is 
in the focus of a concept depends on the structure of the explicit representation. 
For concept hierarchies, we use the following definition: 

Definition 6 (Focus). The focus of a concept k £ K in a concept hierarchy 
H = {K,E,l), is a finite concept hierarchy f{k,H) = {K',E',l') such that: 
K' C K contains k, its ancestors, and their direct descendants; E' fl E is the 
set of edges between the concepts of K' ; V is the restriction of I on K' . 

The contextualization of the interpretation of concept k of a, context c is formula 
w{k), called contextualized interpretation of k, which is computed by combining 
the linguistic interpretations associated to each concept h in the focus of k. The 
two main operations performed to compute w{k) are sense filtering and sense 
composition. 

Sense filtering uses NL techniques to discard synsets that are not likely to be 
correct for a label in a given focus. For example, the sense of Arizona as a snake 
can be discarded as it does not bear any explicit relation with the synsets of the 
other labels in the focus (e.g., with the synsets of United States), whereas it bears 
a part-of relation with United States#! (analogously, we can remove synsets 
of United States). 

Sense composition enriches the meaning of a concept in a context by com- 
bining in linguistic interpretation with structural information and background 
theory. For concept hierarchies, we adopted the default rule that the contextual 
meaning of a concept k is formalized as the conjunction of the senses associated 
to all its ancestors. Furthermore, some interesting exceptions are handled. An ex- 
ample: in the Yahoo Directory, Visual arts and Photography are sibling nodes under 
Arts & Humanities; since in WordNet photography is in a is-a relationship with 
visual art, the node Visual arts is re-interpreted as visual arts minus photography, 
and is then formalized in description logic as: visual art#! U -i photography#! 
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4.2 Computing Relations between Concepts via SAT 

In the second phase of the algorithm, the problem of discovering the relationship 
between a concept in a context c and a concept k' in a context c' is reduced to 
the problem of checking, via SAT, a set of logical relations between the formulas 
w{k) and w{k') associated to k and k' . The SAT problem is built in two steps. 
First, we select the portion T of the background theory relevant to the con- 
textualized interpretation w{k) and w{k'), then we compute the logical relation 
between w{k) and w{k') which are implied by T. 

Definition 7. Let <j) = w{k) and ip = w{k') be the eontextualized interpretation 
of two eoncepts k and k' of two contexts c and c' , respectively. Let B be a theory 
(= logically closed set of axioms) in the logic where cp and ip are expressed. The 
portion of B relevant to cp and ip, is a subset T of B such that T contains all 
the axioms of B containing some concept occurring in (p or ip. 

Clearly different contexts can be associated to different background theo- 
ries, which encodes general and domain specific information. This information is 
stored in the context external assumptions under the field “domain”. Further- 
more, when we determine the mapping between two contexts Cs and c* we can 
take the perspective (i.e., the background theory) of the source or that of the 
target. The two perspectives indeed might not coincide. This justifies the intro- 
duction of directionality in the mapping. I.e. Cs c* means that Cg is more 
general than Ct according to the target perspective; while the relation c* Cg 
represent the fact that Cg is more general that Ct according to the source per- 
spective. 

In the first version of our matching algorithm we consider a background 
theory B determined by transforming the WordNet relations in a set of axioms 
in description logic, as shown in Table 0 . In this table we introduce the notation 
=w, <iu) >iu) and to represent the following relation between senses stored 
in WordNet. 

1. s#k =u, t#h: s#k and t#h are synonyms (i.e., they are in the same synset); 

2. s#k t#h: s#k is either a hyponym or a meronym of t#h; 

3. s#k >u, t#h: s#k is either a hypernym or a holonym of t#h; 

4. s#k_L„,t#h: s#k belongs to the set of opposite meanings of t#h (if s#k and t#h 
are adjectives) or, in case of nouns, that s#k and t#h are different hyponyms 
of the same synset. 

In the extraction of the theory B from WordNet we adopt a certain heuristic 
which turns out to perform satisfactory (see section on experimentation and 
evaluation). However, different sources as, specific domain ontologies, domain 
taxonomies, etc. and different heuristics can be used to build the theory B, from 
which T is extracted. 

Going back to how we build the theory B, suppose, for example, that we 
want to discover the relation between Chat and Forum in the Google directory 
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Table 1. Encoding WordNet relations in T-Box axioms 



WordNet relation 


Domain axiom 


t#k =u, s#h 


t#k = s#h 


t#k <w s#li 


t#k C s#h 


t#k >m s#h 


t#k □ s#h 


t#kT„,s#h 


-it#k C s#h 



Table 2. Verifying relations as a SAT problem 



relation 


SAT Problem 


ks ^ K 
ks kt 

ks — > kt 
ks kt 

ks — ^ kt 


Tt = w{kt) Q w{ks) 

Tt = w{ks) Q w{kt) 

Tt = w{ks) n w{kt) C T 

Tt = w{kt) Q w{ks) and Tt = w{ks) Q w{kt) 
w{ks) n w(kt) is consistent in Tt 



and Chat and Forum in the Yahoo directory in Figure 0 From WordNet we can 
extract the following relevant axioms: 

art#l C humanities#! 

(the sense 1 of ‘art’ is an hyponym of the sense 1 of ‘humanities’), and 
humanities#! □ literature#2 

(the sense 1 of ‘humanities’ is an hyperonym of the sense 2 of ‘literature’). 

The axioms extracted from WordNet can now be used to check what map- 
ping (if any) exists between k and k' looking at their contextualized interpreta- 
tion. But which are the logical relations of w{k) and w{k') that encode a mapping 
function between k and k' as given in Definitional? Again, the encoding of the 
mapping into a logical relation is a matter of heuristics. Here we propose the 
translation described in Table 1^ In this table T( is the portion of the back- 
ground theory of c* relevant to kg and kt- The idea under this translation is to 
see WordNet senses (contained in w{k) and w{k')) as sets of documents. For 
instance the concept art#!, corresponding to the first WordNet sense of art, 
is though as the set of documents speaking about art in the first sense. Using 
the set theoretic interpretation of mapping given in definition Q, we have that 
mapping can be translated in terms of subsumption of w{k) and w{k'). Indeed 
subsumption relation semantically corresponds to the subset relation. 

So, the problem of checking whether Chat and Forum in Google is, say, less 
general than Chat and Forum in Yahoo amounts to a problem of satisfiability on 
the following formula: 
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art#l C humanities#! 
humanities#! □ literature#2 
(art#! n literature#2 H 
(chat#! U forum#!)) 

(art#! U humanities#!) □ 
humanities#! □ (chat#! U 
forum#!) 

It is easy to see that from the above axioms we can infer (0 E 0- 

To each relation it is possible to associate also a quantitative measure. For 
instance the relation “c is compatible with d” can be associated with a degree, 
representing the percentage of models that satisfy (jin'tp on the models that satisfy 
Another example is the measure that can be associated to the relation “c 
is more general than d” which is the percentage of the models of that satisfy <j) 
on the models that satisfy tj). This measure give a first estimation on how much 
'0 is a generalization of (j), the lower percentage, the higher generalization. 

5 Testing the Algorithm 

In this section we briefly report from m the results of the first tests of the 
algorithm. We observe that the tests are performed on real schemas (i.e., pre- 
existing schemas that we found in real applications) , and not on schemas created 
ad hoc. 



( 1 ) 

( 2 ) 

( 3 ) 

( 4 ) 



5.1 Experiment 1: Generating Google’s Links 

The first test uses the Google web directory. It can be viewed as a concept 
hierarchy in which some paths in the hierarchical structure are linked to other 
paths (links are marked by the @-sign in the Google web page), a mechanism 
that allows “jumping” from a path to another in the hierarchy (a sort of symbolic 
link in a Unix file system). Our hypothesis is that these links can be viewed as 
human-deflned relations between concepts, and thus can be used to validate the 
results of running our algorithm between concepts of the Google directory as if 
they were concepts of different contexts. 

Since the Google directory is very large, the test was performed on the News 
sub-hierarchy, as it is relatively small and well covered by WordNet. The result 
of computing 1740^ (about 3,000,000) mappings are summarized and compared 
with Google’s mappings in the following table: 



Description 


exact links 
found 


% 


wrong links 
found 


Equivalence 


7 


5% 


4 


More + less general 


3+81 


56% 


688 


Total 


91 


61% 


692 
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The table must be read as follows: the links provided by Google are on the 
whole 151, and the 61% has been found by the algorithm, while 60 links (39%) 
have been not found. Regarding the wrong links found, we can say that in the 
four cases the algorithm found an equivalence between concepts that were not 
linked in Google; we manually checked these cases, and concluded that the re- 
sults of the algorithm were extremely plausible, and that the two concepts could 
be correctly linked in Google. For example, the algorithm found that the concept 
News/Media/Media Producers/Television is equivalent to News/Media/Media Produc- 
ers/Video, based on the fact that one of the senses of television in WordNet 
has video among its synonyms. The algorithm was not very accurate for the 
other two relations (precision = 11%), even though a manual verification of the 
“false positives” led us to conclude that in most cases they could be valuable 
suggestions for new Google links. 

5.2 Experiment 2: Matching Google with Yahoo! 

The aim of this experiment was to evaluate the CtxMatch algorithm over pairs 
of overlapping structures from Google and Yahoo!. The test was performed on 
two pairs, those with root ‘Architecture’ and ‘Medicine’. The results, expressed 
in terms of precision and recall, are reported in the following table: 





Architecture 


Mediciue 


Relations 


Pre. Rec. 


Pre. Rec. 


equivalence 


.71 .10 


.78 .13 


less general than 


.85 .49 


.88 .46 


more general than 


.51 .91 


.60 .78 



We observe that a content -based interpretation of contextual knowledge al- 
lows the discovery of non trivial mappings. For example, an inclusion mapping 
was computed between Architecture/ History /Periods^and^Styles/Gothic/Gargoyles and 
Architecture/History/ Medievai as a consequence of the relation between Medievai 
and Gothic that can be found in WordNet. 

5.3 Experiment 3: Product Re-classiflcation 

The third test was in the domain of e-commerce. In the framework of a collab- 
oration with a worldwide telecommunication company, the matching algorithm 
was applied to re-classify the catalog of the office equipment and accessories 
(used to classify company suppliers) into UNSPSC0 (version 5.0.2). The valid- 
ity of the relations found by the algorithm, shown in the following table, were 
double-checked manually. 

^ UNSPSC (Universal Standard Products and Services Classification) is an open global 
coding system that classifies products and services. UNSPSC is extensively used 
around the world for electronic catalogs, search engines, e-procurement applications 
and accounting systems. 
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automatic 

classification'^ 


after manual 
revision^ 


Total items found 


324 


100% 


324 


100% 


Rightly classified 


197 


60% 


246 


76% 


Wrongly classified 


67 


21% 


17 


5% 


Non classified 


60 


19% 


61 


19% 



In particular, the automatic classification percentages are computed compar- 
ing the algorithm results with the pre-existent mappings. After manual review, 
the mappings automatically discovered by the algorithm improved the manual 
ones. 

6 Related Work 

Rahm and Bernstein [O] suggest that there are three general strategies for 
matching schemas: instance based (using similarity between the objects (e.g., 
documents) associated to the schema to infer the relationship between the con- 
cepts); schema-based (determining the relationships between concepts analyzing 
the structure of a hierarchy and the meanings of the labels); and hybrid (a com- 
bination of the two strategies above). Our algorithm falls in the second group. 
In this section, we briefly compare our method with some of the most promising 
schema-based methods recently proposed, namely MOMIS 0 a schema based 
semi automatic matcher, CUPID [H] a schema based automatic matcher and 
GLUE an instance based automatic matcher. 

The MOMIS (Mediator environment for Multiple Information Sources) p|) 
is a framework to perform information extraction and integration from both 
structured and semistructured data sources. It takes a global-as-view approach 
by defining a global integrated schema, starting from a set of sources schema. 
In one of the first phases of the integration, MOMIS supports the discovery of 
overlapping (relations) between the different source schema. This is done by ex- 
ploiting the knowledge in a Common Thesaurus with a combination of clustering 
techniques and Description Logics. Another difference between the matching al- 
gorithm implemented in MOMIS and CtxMatch is that MOMIS includes an 
interactive process as a step of the integration procedure, and thus does not 
support a fully automatic and run-time generation of mappings. 

More similar to CtxMatch is the algorithm proposed in p], called CUPID. 
This is an algorithm for generic schema matching, based on a weighted combina- 
tion of names, data types, constraints and structural matching. This algorithm 
uses a limited amount of linguistic knowledge, as it associates a thesaurus to 
each schema. However, unlike CtxMatch, it does not exploit the whole power 
of a linguistic resource like WordNet. Another difference between CUPID and 
CtxMatch is that CUPID discovers relations between two schemas S and T 
only when S and the embedding of S' in T are structurally isomorphic. As a 

Manually verified by ourselves. 

® Manually verified by Alessandro Cederle Managing Director of Kompass Italia 
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consequence, CUPID cannot deal with concepts that are intuitively equivalent, 
but are represented as non isomorphic schemas. 

A different approach to ontology matching has been proposed in 0| . Although 
the aim of the work (i.e. establishing mappings among concepts of overlapping 
ontologies) is in many respects similar to ours, the methodologies are significantly 
different. A major difference is that the GLUE system builds mappings taking 
advantage of information contained in instances, while the current version of 
the CtxMatch algorithm completely ignores them. This makes CtxMatch 
more appealing, since most ontologies currently available in the Semantic Web 
do not contain a significant collection of instances. A second difference concerns 
the use of domain-dependent constraints, which, in case of the GLUE system, 
need to be provided manually by domain experts, while in GtxMatch they 
are automatically extracted from an already existing resource (i.e. WordNet). 
Finally, GtxMatch provides a qualitative characterization of mappings in terms 
of the relation between two concepts, a feature which is not considered in GLUE. 
Even though a comparison with the results reported in |o| is rather difficult, the 
accuracy achieved by GtxMatch can be roughly compared with the accuracy 
of the GLUE module which uses less information (i.e., the “name learner”). 

7 Conclusions 

In the paper, we presented a first version of an algorithm for matching semantic 
schemas - viewed as contexts - via SAT. 

We believe that this work can have a significant impact from a theoretical 
point of view. Indeed, the scientific challenge behind the algorithm is to deter- 
mine what is the minimal common ground to enable communication between 
entities that do not share common meanings (at least, not in the sense of the 
approaches that assume the necessity of a shared ontology to enable communi- 
cation). As a consequence, the relations discovered by the algorithm are always 
directional (from a concept in a context to concept in another context, but not 
vice versa), and this refiects the idea that what is a good mapping from the 
point of view encoded in a context might not be acceptable from the point of 
view encoded in the other context. 

Of course, a lot of work remains to be done, and in particular: generalizing 
the types of structures we can match (beyond concept hierarchies); taking into 
account a larger collection of explicit assumptions; going beyond WORD Net as 
a source of linguistic and domain knowledge. 
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Abstract. In a previous paper, we proposed a first formal and con- 
ceptual comparison between the two most important formalizations of 
context in AI: Propositional Logic of Context (PLC) and Local Mod- 
els Semantics/ MultiContext Systems (LMS/MCS). The result was that 
LMS/MCS is at least as general as PLC, as it can be embedded into a 
particular class of MCS, called MPLC. In this paper we go beyond that 
result, and prove that, under some important restrictions (including the 
hypothesis that each context has finite and homogeneous propositional 
languages), MCS can be embedded in PLC with generic axioms. To prove 
this theorem, we prove that MCS cannot be embedded in PLC using only 
lifting axioms to encode bridge rules. This is an important result for a 
general theory of context and contextual reasoning, as it proves that lift- 
ing axioms and entering context are not enough to capture all forms of 
contextual reasoning that can be captured via bridge rules in LMS/MCS. 



1 Introduction 

This paper continues the investigation of formal theories of context we started in 
. In that paper, we compared two well-known formalizations of context, namely 
the Propositional Logic of Context (PLC) and Local Models Semantics (LMS) 
0, axiomatized via Multi Context Systems gg| (MCSf|. The main technical 
result was that LMS/MCS is at least as general as PLC, as it can be embedded 
into a particular class of MCS, called MPLC. 

In this paper we go beyond that result, and analyze the claim that LMS/MCS 
is strictly more general than PLC. The main technical results are the following: 
(i) under some important restrictions (including the hypothesis that each con- 
text has finite and homogeneous propositional languages), LMS/MCS can be 
embedded in PLC with generic axioms; (ii) LMS/MCS cannot be embedded in 
PLC using only lifting axioms to encode bridge rules. These results are impor- 
tant for a general theory of context and contextual reasoning in two senses: first, 

^ Hereafter, we will refer to the general framework of LMS together with its axioma- 
tization via MCS as LMS/MCS. 
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the restrictions needed to prove the first theorem have a significant impact on 
the fulfillment of the intuitive desiderata that were brought forward to motivate 
the formalization of context in AI (e.g., in Cl) ; second, they prove that lifting 
axioms and entering context are not enough to capture all forms of contextual 
reasoning that can be captured via bridge rules in LMS/MCS. 

2 The Two Systems: PLC and LMS/MCS 

In this section we quickly revise the two formalisms, and prepare the ground for 
the technical comparison between thenJl. 



2.1 Propositional Logic of Context 

In this paper, we use the version of PLC presented in pj. Given a set IK of labels, 
intuitively denoting contexts, the language of PLC is a multi modal language 
on a set of atomic propositions P with the modality ist{n, (p) for each context 
(label) K S K. More formally, the set of well formed formulae W of PLC, based 
on P, are 

W := P U (-.P) U (P D P) U ist{K, P) 

The other propositional connectives are defined as usual. If k is a context, 
then the formula ist{K, (f>) can be read as: 4> is true in the context k. PLC allows to 
describe how a context is viewed from another context. For this PLC introduces 
sequences of contexts (labels). Let K* denote the set of finite contexts sequences 
and let K = Ki .. .Kn denote any (possible empty) element of K*. The sequence 
of contexts KiK 2 represents how context K 2 is viewed from context Ki. Therefore, 
the intuitive meaning of the formula ist{K 2 , 4>) in the context k± is that <j) holds in 
the context K 2 , from the point of view of Ki. Similar interpretation can be given 
to formulae in sequences of contexts longer than 2. A model for PLC associates a 
set of partial truth assignments to a subset of context sequences and satisfiability 
is defined with respect to a context sequence. 

Definition 1. A model 971 of PLC is a partial function which maps context 
sequences in K* into a set of partial truth assignments for P. 

971 £ (K* —^p P(P -^p {true, false})) 

where A ^p B denotes the set of partial functions from A to B and P(A) 
denotes the powerset of A. 

The original intuition was that, partial truth assignments allow us to repre- 
sent the fact that in different context sequences there are different sets of mean- 
ingful formulae. Indeed, a model 971 defines a vocabulary, denoted by Vocab(97t), 
namely, a function that associates to each context sequence a set of meaningful 

^ An exhaustive presentation of the two formalisms is beyond the scope of this paper; 
interested readers can refer to the bibliography for more details. 
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formulae. Formally, a vocabulary is a relation Vocab C K* x P that associates 
a subset of primitive propositions with each context. Vocab(9Jl), i.e, the vocab- 
ulary defined by the model 971, is the function that associates to each context 
sequence k a subset of P for which all the assignments in 971 (k) are defined. That 
is, {k,p) G Vocab(97l) if and only if 971(Tc) is defined and, for all v G 971(7c), o{p) 
is defined (where v is a truth assignment to atomic propositions). 

Satisfiability and validity of formulae are defined only for these models that 
provides enough vocabulary, i.e. the vocabulary which is necessary to evaluate a 
formula in a context sequence. Each formula (f> in a context sequence k implicitly 
defines its vocabulary, denoted by Vocab(7f, ((>), which intuitively consists of the 
minimal vocabulary necessary to build the formula 4> in the context sequence 7t. 
More formally, Vocab(7«, (()) is recursively defined as follows: 



Vocab(K,p) = {{k,p)} 

Vocab(7i, -!(/)) = Vocab(75, (f>) 

Vocab(7j, (j) D Ip) = Vocab(75, (p) U Vocab(7t, ip) 

Vocab(7t, ist{n, (p)) = Vocab(7tK, (p) 

Definition 2 (Satisfiability and Validity). Let (p and 971 &e a formula and a 
model respectively. <p is satisfied in 971 by an assignment v G 971(k) (notationally 
971, n \=K <p) according to the following clauses: 

1. m,v\=Tip iffv{p) = true; 

2. 971, V l= 7 f ^(p iff not 971, v (=« (p; 

3. M,v\=^(p^ %p iff not M,v'^Tc(p or 971, v %p; 

4- 971, n |=K ist{n, (p) iff for all v' G 971(kk), 971, v' </>; 

5. 971 \=K (p iff for all v G 971(k); 971, u <P; 

6. <P iff for all PLC-model 971, such that Vocab(7t, cp) C Vocab(971), 971 1=^ <P- 

(p is valid in a context sequence k if (p; (p is satisfiable in a context sequence k 
if there is a PLC-model 971 such that 971 |=k (p. A set of formulae T is satisfiable 
at a context sequence n if there is a model 971 such that 971 (p for all <p ^T. 

According to the above definition, vocabularies affect truth in contexts 
making each formula outside the vocabulary false. This implies that a PLC- 
model 971 presents a non classical semantics for all the formulas (p such that 
fK,(p) % Vocab(971). For instance, if a proposition (it,p) ^ Vocab(971) then 
971 p V -ip. This “non classical” effect however disappear in the definition 

of validity. For validity of a formula cp is checked by considering only the models 
whose vocabularies contain (p. This means that validity and satisfiability can 
be formulated by considering only PLC-models with complete vocabularies, i.e. 
PLC-models 971’s with {k,p) G Vocab(971) for each p G P and 7c G K*. 

Theorem 1 (Reduction to complete vocabulary). A formula is valid in 
PLC if and only if it is satisfied by all the PLC-models with complete vocabulary. 
Similarly, a formula is satisfiable in PLC if and only if there is a PLC-model 
with complete vocabulary that satisfies it. 
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(PL) Lk </> If is an instance of a classical tautology 
(K) Hk ist{K, (j) D Ip) ^ ist{K, cp) D ist{K, ip) 

(A) \-j^ ist{Ki,ist{K2,(p)y Ip) A ist{ni,ist{K2,(p))y ist{Ki,ip) 



(MP) 

(CS) 



I~7t0 

HkI/’ 

l~7ire<A 



Fig. 1. Axioms and inference rules for PLC 



Ignoring vocabularies, PLC is a multi-modal K extended with the axiom (Z\), 
on the set of propositions P. Indeed the Hilbert style axiomatization of validity 
proposed in [|| — presented in Figure Q — is the modal system K extended with 
the axiom {A). 

2.2 Local Models Semantics and Multi-context Systems 

The version of LMS we present here was presented in [ 7 ]. Let {Li}i^j be a family 
of languages defined over a set of indexes I (in the following we drop the index 
i G I). Intuitively, each Li is the (formal) language used to describe the facts in 
the context i. In this paper, we assume that I is (at most) countable. Let Mi be 
the class of all the models (interpretations) of Li. We call m G Mi a local model 
(of L,). 

To distinguish the formula (p occurring in the context i from the occurrences 
of the “same” formula (p in the other contexts, we write i : (p. We say that i : <p 
is a labelled wff, and that (p is an Li-wS. For any set of labeled formulae L, 

n = {<P\i:<PG L}. 

Definition 3 (Compatibility chain@). A compatibility chain c = {c^ C 
Mi}i^i is a family of set of models of Li such that each d is either empty or a 
singleton. We call Ci the i-th element of c. A compatibility chain is nonempty if 
one of its components is nonempty. 

A compatibility chain represents a set of “instantaneous snapshots of the 
world” each of which is taken from the point of view of the associated con- 
text. Due to the fact that contexts describe points of view of the same world, 
certain combinations of snapshots are possible while others can never happen. 
To distinguish between these two sets, LMS contains the notion of compatibil- 
ity relation — defined in the following — represents the “admissible” combinations 
snapshots. 

Definition 4 (Compatibility relation and LMS-model). A compatibility 
relation is a set of compatibility chains. A LMS-model is a compatibility relation 
that contains a nonempty compatibility chain. 

^ For the sake of this paper, we use a definiton of compatibility chain which is spe- 
cialized and simpler than the one given in |7|. 
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Definition 5 (Satisfiability and Entailment). Let \= be the propositional 
classical satisfiability relation. We extend the definition of \= as follows: 

1. for any f G Li, d \= 4> if, for all m G d, m \= f; 

2. c i . (f) if Ci\^ (f), 

3. C\=i:(j)if, for all c G C, c \= i : (j>; 

4- A \=ci 4’ if, for oil m G Ci, if m\= Fi, then m \= 4; 

5. r i ■ 4 if, either there is a j i, such that cj ^ Fj, or Fi \=^. 4, 

6. F i-4if, for all c G C, F i ■ 4i 

7. For any class of models £, F i:4if, for all models C G F |=:c i '■ 4- 

We adopt the usual terminology of satisfiability and entailment for the state- 
ments about the relation Thus we say that c satisfies 4 ed, i, or equivalently, 
that 4 is true in Ci, to refer to the fact that ct |= 4- We say that F entails i : 4 
in c to refer to the fact that F |=c i ■ 4- Similar terminology is adopted for 

F |=c i ■ 4 uud F |=e; i : 4- 

MultiContext Systems (MCS) are a class of proof systems for LM0. The 
key notion of an MCS is that of bridge rule. 

Definition 6 (Bridge Rnle). A bridge rule on a set of indices I is a rule of 
the form: 

i\ ■ 4l • • • in '- 4n . 

^ — 7 or 

I : (j) 

where i\, . . . ,in,i G I, A bridge rule can be associated with a restriction, namely 
a criterion which states the conditions of its applicability. 



Definition 7 (MnltiContext System (MCS)). A MultiContext System /or 
a family of languages {Li}, is a pair MS = {{Ci = {Li, Qi, Ai)}, Abr) , where 
each Ci = {Li,Fli, Af) is a theory (on the language Li, with axioms f2i and 
natural deduction inference rules Ai), and A^r is a set of bridge rules on I. 

MCSs are a generalization of Natural Deduction (ND) systems m- The gen- 
eralization amounts to using formulae tagged with the language they belong to. 
This allows for the effective use of the multiple languages. The deduction ma- 
chinery of an MCS is the composition of two kinds of inference rules: local rules, 
namely the inference rules in each Ai, and bridge rules. Local rules formalize rea- 
soning within a context (i.e. are only applied to formulae with the same index), 
while bridge rules formalize reasoning across different contexts. 

Deductions in a MCS are trees of formulae which are built starting from a 
finite set of assumptions and axioms, possibly belonging to distinct languages, 
and by a finite number of application of local rules and bridge rules. 

In this paper, we present a definition of MC system which is suitable for our purposes. 

For a fully general presentation, see |si|. 
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2.3 Lifting Axioms and Bridge Rules 

A crucial feature of a formal theory of context — contained both in LMS/MCS 
and PLC — is the possibility to specify relations between facts of different con- 
texts. This is an essential feature of contextual reasoning, as contexts are not 
simply unrelated representations, but typically are different representations of 
the same world. For example, two contexts may describe the same piece of the 
world from the same perspective, but at different level of detail; or may describe 
the same piece of the world, only from different perspectives. PLC formalizes 
relations between contexts via lifting axioms, while LMS/MCS uses bridge rules. 
Lifting axioms are defined as 

“ . . . axioms which relate the truth in one context to the truth in another 
context. Lifting is the process of inferring what is true in one context 
based on what is true in another context by the means of lifting axioms” 

m 

The general form of lifting axioms is the following: 

ist{Ki,(j)i) A ... A ist{Kn, 4>n) D ist{K, (f)) (1) 



As any formula in PLC, lifting axioms must be stated in a context. The 
lifting axiom above can be intuitively read as “ is true in a context k if the 
formulas (fi, . . . , cfn are true in the contexts ki, ... ,Kn respectively” . 

Bridge rules, introduced in Definition^, are inference rules whose premises 
and conclusion belong to different contexts. The general form of bridge rules is 
described in 0, and can be though as a generalization of a Natural Deduction 
inference rules m which involve more than one index. For the sake of this paper 
we consider only bridge rules of the following form. 



K\ '. (fi . . . Kji '. (j)ji 

K : (f 



br 



( 2 ) 



The above bridge rules roughly formalize the same intuition as that formalized 
by lifting axiom Q). 

The main difference between lifting axioms and bridge rules is that lifting 
axioms are stated in an external context, which must be expressive enough to 
represent facts of all the contexts involved (using ist-formulae), whereas bridge 
rules allow stating relations between contexts without the need of an external 
context. There are situations where having an external context may be an ad- 
vantage (for example, when one needs to reason about lifting axioms themselves, 
e.g. to discover that a lifting axiom is redundant, or leads to inconsistent con- 
texts). However, in general, specifying an external context can be very costly — 
especially when there are many interconnected contexts — as the external context 
essentially duplicates the information of each context. LMS/MCS allows both 
solutions. Indeed, instead of using bridge rules to lift a fact from m to K 2 , 
one can define a third context connected with ki and K 2 via bridge rules and 
explicitly add an axiom like P) to this new context!. This very last observation 

® This approach was used, for example, in the solution to the qualification problem 
presented in P . 
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Fig. 2. Embedding LMS/MCS into PLC 



constitutes the underlying idea of the proof of the fact that PLC can be em- 
bedded in LMS/MCS described in Q]. The converse question, i.e., if LMS/MCS 
can be reconstructed in PLC will be answered in the rest of this paper. As a 
consequence we will have a sharper intuition on the analogies and differences 
between bridge rules and lifting axioms. 

3 Reconstructing LMS/MCS in PLC 

Since a comparison of the two logical systems should be done on a common 
ground, we consider LMS/MCS with homogeneous languages, i.e., LMS/MCS 
whose contexts have all the same propositional language. Indeed, as it is shown 
by TreoremQ, PLC does not support contexts with different languages. Similarly 
we restrict the comparison to LMS/MCS in which all contexts have the same 
inference engine, which is contexts are all classical propositional theories. 

The general intuition for encoding an MCS into PLC is shown in FigureQ- 
Given a MCS with I contexts, we define a PLC with I contexts (one for each 
context in MCS) and an additional (meta/external)-context e. The content of 
each context in I and the compatibility relations (bridge rules) between contexts 
are described via ist-formulas in in e. The representation of the content of the 
MCS contexts is quite straightforward: any formula i : cj) in MCS is translated 
into a formula e : ist{i, </) in PLC. For bridge rules, the translation is more tricky. 
Indeed, the intuition that a bridge rule like Q) is translated into the lifting axiom 
CD does not work. Indeed, the following theorem proves a first important fact, 
namely that in general bridge rules cannot be modeled in PLC only as a set of 
lifting axioms. Let BK./ be the set of bridge rules between a set I of contexts 
with language Li = Lj (for i,j G I). 

Let LA c W the set of lifting axioms among the contexts I expressed in a 
new context e not in I. The notation F Lhj. i : <j) stands for: i : (f>is derivable from 
r in the MCS with the set I of contexts, no axioms, and the set br of bridge 
rules. 
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Theorem 2. There is no transformation la : BM ^ LA such that for any finite 
subset hr C BM of bridge rules: 



D ■ ■ 5 'in • 4^n i • 0 

if and only if (3) 

^ {ist{il,(j)l) A ... A ist{in, 0n) D ist{i, 0)) 



Proof. The theorem is proved by counterexample. Consider the following two 
bridge rules. 



1 : P 
2-.q 



bri2 



2:g 

1 : r 



br2i 



(4) 



where p, q, and r are three distinct propositional letters. Let bri 2 and br 2 i be both 
unrestricted (i.e., always applicable). Considering br 12 or br 2 i separately, they do 
not affect theoremhood in either context 1 and 2. Formally, for i = 1, 2, \~bri 2 i ■ 0 
if and only if 0 is a propositional tautology, and analogously hbrai * ^ 0 if and 
only if 0 is a tautology (see |j^ for a proof of a similar fact). Instead, combining 
br 12 and br 21 in the same MCS, new theorems, which are not tautologies, can be 
proved. An example of such a theorem is 1 : p D r, and its proof is the following: 



1 : p(*) 



1 



1 : p D r 



br 21 

Z) I (Discharging the assumption ^*0 



Let la{bri 2 ) and la{br 2 i) be the following general conjunctions of lifting axioms: 

M /K^ 

la{bri2) = A A isti^i^nki 4^mk) ^ "0m) 

m—1 \k—l 
N / Kr^ 

la{br 2 i) = A A isti^ink^ f^nk) D ist(^jm'4^n 

n=M+l \k=l 

where imk, ink, and are either 1 or 2. Posing br = {bri 2 ,br 2 i}, we have that 
Abrebr ^Abf) is equivalent to the following formula: 

N / K„ 

A A ist{inki ^i^nk') ^ 

n— 1 \fc— 1 

Suppose, for contradiction, that equivalence (P holds. Since 1 : p D r is derivable 
via br 12 and br 2 \, we have that 

he /y la{br) D ist{i,p D r) (7) 

br£br 

Consider the PLC-model 971 with 971(1) equal to all the assignments for Li and 
971(2) equal to all the assignments for L 2 - Since p D r is not valid, there is an 
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assignment v such r/ ^ p D r. By construction, 2Jt(l) contains all the assignments 
to Li- As a consequence 9Jl ist{l,p D r). Soundness of PLC and Q) entail 
that OJl Abrebr ^CL{hr), and therefore, that there is an n < iV such that 

f\ ist{ink:(f>nk) and Tl ist(jnj'lpn) (8) 

k^l 

The left part of (EJ states that each cj)nk (with 1 < k < K^) is a tautology, as it 
must be true in all the assignments in As a consequence we have that 



be 4^nk) (9) 

k^l 

The right part of (E) states that there is an assignment v G 9Jl(jn) such that 
ly ^ ipn, i-c., ipn is not a tautology. Let us consider two cases n < M, and n > M. 
In the first case we have, due to the definiton of la{br 12 ), we have that 



' K„ 



he la{bri 2 ) D f\ ist{ink,(l)nk) ^ istijn.'lpn) 



\fc=l 

while, in the second one we have: 

/ Krb 



( 10 ) 



he la{br2l) ^ ( A A ist{jn,1pn)j (U) 

By applying Modus Ponens to and or to (Eil) and (jil), we obtain one of 
the following two consequences: 



\~e la{br 12 ) D ist{jn,tpn) or h^ /o( 6 r 2 i) D V’n) 



If the equivalence holds we would have that, either Jn '■ or hb^ji jn '■ V’nj 
while ipn is not a tautology. But this is a contradiction. 

Lifting axioms are not the only possible isf-formulas. There are 2 st-formulas, 
as for instance ^ist{i,cf)) or ist{i,(j)) D V ist{k,9), which are not lifting 

axioms but could be used to represent the compatibility relation formulated 
by bridge rules. So the question arises of whether bridge rules can be encoded 
by generic ist-formulas in some external context e. In the following we show 
that this is the case for MCSs with a finite number of contexts and with finite 
languages. 

Theorem 3. There is a transformation a(.) from finite sets hr G BM/ of bridge 
rules to ist-axioms, and a context e such that: 



■ 4^1^ • ■ ■ 5 'In • h5r i • 4^ 
if and only if 

he a{hr) D 0i) A ... A ist{in, (f>n) A ist{i, cf) 



( 12 ) 
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Proof. The proof is constructive, i.e., we define the transformation a(.) for each 
set of bridge rules. The definition of a{br) passes through a syntactic encoding 
of the LMS-models for br. 

Let C be a LMS-model (i.e. a set of chains), the set of PLC-models SDtc 
corresponding to C is defined as follows: 



mc = 




C is a subset of C such that for any i G I, 
= Ucec' c. 



(13) 



Let C be the set of LMS-models for br. The set awc is defined as Ucec^c- 
Let us prove that the logical consequence defined by C can be represented by 
valid formulas in the set of models Strtc, i.e., that: 



D ■ ^1; ■ • • ; ■ 4^n |~C ^ ■ 4^ 

if and only if for all dJl G SDtc (14) 

|=e ist{ii,(j)i) A ... A ist{in, 4>n) D cf) 



Suppose that %i : cfi, . . . ,i„ ■ (fn |=c * : </*• Let ‘OJlc G Stllc) with C" C C € C. 
Suppose that DJlc [=e ist{ik,(j)k) for any 1 < A: < n. This implies that for all 
c G C , Ci^ \= (j)k- From the hypothesis we have that Ci ^ 4>, and therefore that 
Tic Ne 

Vice-versa, let us prove that 971 \=e A ... A ist{in, 4>n) D (f>) for 

all 971 G Tic implies that for any model C of br and for any chain c € C, if 
Cifc 1= 4>k for 1 < A: < n, then Ci |= (f>. Notice that, for any c G C G C we have 
that 97l{c} G 971c- By definition (see equation (|T1^), 97T{c} is such that 97l(i) = c^. 
By hypothesis we have that 971{c} N *sA(*i, <(>i) A . . .AisA(i„, (/>„) D ist{i, cf), which 
implies that if Ci^ ^ (fk for all 1 < A; < n, then Ci \= 4>. 

To define a{br) we proceed as follows: for any PLC model 971 G Tic we find 
a formula that axiomatizes exactly 971. Then the axiomatization of 97tc can 
be obtained by the disjunction of all the axiomatization (f>^ associated to each 
single PLC-model 971 of 97lc (tfiis definition is possible because 97lc is finite). 

Let 971 G Tic, and let cjixa be the following formula 



/\ isA(i, y 4)^) A f\ ~^ist{i,^4)^)\ (15) 

where 4>u is the conjunction of all the literals verified by the assignment v. ^ I iil i 
is a finite formula, for the set I of context is finite and the set of literals in each 
context is finite too. By adding (EJ as axioms in the context e we obtain an 
PLC that is satisfied only by the model 971. Let 

a{br) = y (j)^ 
memtc 

Let us now prove the equivalence (H2l. By soundness and completeness of br, 
i\ : (fi,. . . ,in ■ 4>n Lbr * : (p holds if and only if 



D ■ 4^1 1 ■ • • ; ■ 4^n t~C ^ ' 4^ 



(16) 
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By (0 , we have that (IHH holds if and only if for all SOT G 9Jtc , 

971 |=e ist{ii,(j)i) A ... A istiin, (j>n) 3 istii, (f>) (17) 

By construction of a{br), 971 |=g a{br), if and only if 971 G 97lc- This implies that 
(E3 holds if and only if 

\=e a{br) D ist{ii,4>i) A ... A ist{in, 4>n) ist{i, (j)) (18) 

Finally, soundness and completeness of PLC implies that P) holds if and only 
if hg a{br) D A ... A ist{in, ^„) D ist{i, cj)), which concludes our proof. 

Theorem ^hows that the translation from bridge rules to generic ist-formulas 
is possible. However, it is still open the question if a set of bridge rules can be 
translated into set o/ ist-formulas which are lifting axioms. Here the answer is 
negative. 

Theorem 4. There does not exist a transformation la{.) from finite sets hr G 
KM/ of bridge rules to a conjunction of lifting axioms, and a context e such that: 

■ 01; ■ • ■ ) ■ 0n ^ ■ 0 

if and only if (19) 

he la{br) D ist{ii,(f>i) A ... A ist{in, 0„) D ist{i, 0) 



Proof. The proof is by counterexample. Consider the following LMS/MCS com- 
posed of two languages Li and L 2 containing the single proposition p and q 
respectively. Consider the following set of bridge rules: 



1 : -ip 

TT7 



bri2 




2 : ~>q 
1 : p 



br 21 



2:g 

1 : -ip 



br. 



21 



1 : T 

2 : T 



J-12 



2 : T 
1 : T 



J-21 



where all the rules but those indexed with r are non restricted. The chains that 
satisfies the un-restricted bridge rules are: 

c={p,q), d={p,q), e={p,q) 



where p denotes the model in which p is true and p the model in which p is false. 
Similarly for q and q. The compatibility relations that satisfy the restricted 
bridge rules are: 



{c}, {d}, {e}, {c,e}, {d,e} 

Following the definitions given in the proof of TheoremSone can see that the ist- 
formulas associated to the set of LMS-models above is equivalent to the following: 



-iist(l, T) A -iist(2, T) A {ist{l,p) V ist{2, q)) 



Notice that the above formula cannot be reduced in the form of a conjunction 
of lifting axioms. 
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4 Discussion 



In the previous section we have given two somehow opposite results: namely 
Theorem 0and Theorem 0 Intuitively the former states that bridge rules cannot 
be transformed into lifting axioms, so that this translation composes; the latter 
states that finite sets of bridge rules can be translated into a finite sets of ist- 
formulas. This two results constitutes two boundaries within which one can look 
for further correspondence results. 

Theorem ^states that a set of bridge rules cannot be translated into a set of 
lifting axiom simply by translating each single bridge rule into a lifting axiom. 
This is intuitively due to the fact that bridge rules allows for inter-leaving of local 
reasonings, while lifting axioms do not. By inter-leaving of local reasonings we 
mean the reasoning pattern composed by a sequence of chunks of local reasoning. 
This reasoning pattern allow for cyclic contextual reasoning. For instance, one 
starts in a context ki switches in a context K 2 then, switch back in the context 
Ki and then again in the context K 2 - Consider the bridge rules given in the 
counter-example of the proof of Theorem plus the bridge rule: 



1 : p D r 
2 : s 



br‘ 



12 



An example of inter-leaving of local reasonings is the following proof of 2 : s. 
1 : 

X ori2 



1 



br 21 



1 : p D r 



Z) I (Discharging the assumption 
br\n 



PLC does not support inter-leaving of local reasonings. The reasoning pattern 
implemented in PLC, instead, is “bottom up combination of local reasonings” in 
a tower of transcendent contexts. In this reasoning pattern one starts from the 
bottom of a tower of contexts, he locally reasons in a (set of) context(s), say in the 
context denoted by the sequence ki . . . then he transcends to by (CS) to the 
context Hi . . . Kn and he locally reasons there (e.g., by using the lifting axioms), 
then he transcends ^ain to ki . . .k„_i. Eventually, he stops at some point of 
the tower. Theorem U shows that “inter-leaving of local reasonings” cannot be 
reduced to “bottom-up combination of local reasonings -f lifting axioms” . 

Theorem E] instead, provides a way to translate LMS/MCS into PLC. Fur- 
thermore, the counterexample provides in TheoremQshow that the one proposed 
in Theorem Pis the “simplest” translation, i.e., that any other translation can- 
not be reduced to a conjunction of lifting axioms. If one wants to rewrite bridge 
rules into lifting axioms he has to take into account the following two points: 

1. in embedding LMS/MCS into PLC, bridge rules are not directly trans- 
lated into implications, as one could expect. For instance the MCS con- 
taining the bridge rules (5^ are not translated into the axioms of the form 
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ist{l,p) D ist{2,q) and ist{2,q) D ist{l,p) as shown by TheoremQ- Indeed, 
the PLC formalizing the bridge rules @ is not computed by a direct (syntac- 
tic) translation of the bridge rules of MCS. The axioms IT^ i are determined 
by enumerating all the LMS-models of (0) and by axiomatizing them in a 
PLC-formula. This is not a problem of our translation, indeed any alterna- 
tive translation which is equivalent to the axiom (ESI) with more than two 
contexts cannot be reduced to a set of lifting axioms. 

2. the above translation is not compositional. This means that, if PLCi and 
PLC 2 are the representations of MCSi and MCS 2 respectively, then the 
translation of MCSi UMCS 2 (i.e., the MCS containing the axioms and the 
bridge rules of both MCSi and MCS 2 ) cannot be defined as the union of the 
axioms of PLCi and PLC 2 . 

5 Conclusions 

This paper concludes the technical and conceptual comparison between 
LMS/MCS and PLC we started in p]. The results presented in this paper 
will help clarify the technical and conceptual differences between the two ap- 
proaches, by showing how bridge rules can be represented in lifting axioms or in 
ist- formulas. In particular we have shown that: 

1. Bridge rules cannot be translated into lifting axioms; 

2. sets of bridge rules can be translated into set of ist- formulas which cannot 
be reduced to a conjunction of lifting axioms. 

We stress the fact that the two formalisms do not provide equivalent solutions, 
even if they share some of the intuitive motivations for having a formal theory 
of context in AI. The technical results we provide in the previous paper P] and 
in this paper allow us to justify the conclusion that LMS/MCS is more gen- 
eral than PLC, and that it captures some patterns of contextual reasoning in 
a more intuitive and straightforward way. Moreover, in our opinion, the restric- 
tions needed to reconstruct LMS/MCS in PLC have a significant impact on the 
appropriateness of PLC to capture the intuitive desiderata of a logic of context 
in AI. 
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Abstract. This paper discusses the dynamic of context through the use of a 
context-hased formalism called contextual graphs that has been initially 
developed in the SART application for the development of a support system in 
incident solving on a subway line. First, we present the formalism of contextual 
graphs through its new implementation. Second, we discuss the dynamic of 
context in contextual graphs. Third, we present two characteristics of contextual 
graphs as they relate to the dynamic of context, the incremental knowledge 
acquisition and the explanation generation. We conclude by a discussion of the 
key properties and the potential of contextual graphs for other applications. 

Keywords: Contextual graphs, explanation, visual explanations, context 
dynamic, applications 



1 Introduction 

Brezillon [1,2] defined context as a collection of relevant conditions and surrounding 
influences that make a situation unique and comprehensible. Based on this initial 
work, Pomerol and Brezillon [19] showed strong relationships between context and 
knowledge. Pasquier et al. [16] gave an example of the application of these ideas in 
the SART application in the monitoring of a subway line. A large volume of 
knowledge (about trains, electricity, people reaction, and so on) contributes to make 
each situation unique, while some more particular conditions about the time, the day, 
the weather and so on, influence many decisions. Brezillon and Pomerol [7] proposed 
three types of context called external knowledge, contextual knowledge and 
proceduralized context. 

At a given step of a decision process or of the accomplishment of a task, we 
distinguish between the part of the context which is relevant at this step, and the part 
which is irrelevant. The latter part is called external knowledge. The former part is 
called contextual knowledge, and obviously depends on the individual agent and on 
the decision at hand. Moreover, there is a part of the contextual knowledge that is 
proceduralized at this step, which we refer to as the proceduralized context. The 
proceduralized context is invoked, structured and situated according to a given focus. 

An important issue is the passage from contextual knowledge to proceduralized 
context. This proceduralization process [17, 18] is task-oriented and provides a 
consistent explanatory framework to anticipate the results of a decision or an action. 
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This point is particularly salient when a company establishes procedures and its 
employees contextualize these procedures to develop efficient practices. 

Companies establish procedures that are collections of safety action sequences 
permitting to solve a given problem in a wide set of circumstances. These procedures 
are supposed to cover large classes of problems whatever the conditions in which 
problems must be solved. This is a kind of uniformization in problem solving but it 
often results in sub-optimal solutions for problem solving. Conversely, each operator 
develops their own practice, tailoring the procedure in order to take into account the 
current context, which is particular and specific. 

The modeling of operators’ reasoning (practices) is a difficult task because 
operators use a number of contextual elements, and because procedures for solving 
complex problems have some degree of freedom. Thus, it would be better to store 
advantages and disadvantages rather than the complete decision. 

This discussion points out that if it is relatively easy to model procedures, the 
modeling of the corresponding practices is not an easy task because they are as many 
practices as contexts of occurrence. Moreover, it is not possible to establish a global 
procedure for complex problem solving, but only a set of sub-procedures for solving 
different parts of the complex problems. 

Based on the design of the contextual graphs for the SART application (e.g. see 
[19]), we present in this paper a new development of our context-based formalism 
that (1) goes beyond the SART application, (2) is relevant for problems dealing with 
procedures, practices and context, and (3) presents new functionality in terms of 
incremental acquisition of practices and explanation generation. Hereafter, the paper 
is organized in the following way. First, we present the formalism of contextual 
graphs through its current implementation. This version of the contextual graphs 
differs of the version presented previously [15] because we suppress assumptions 
concerning storage and update of data that darken the expressiveness of the formalism 
about the dynamic of context, the incremental acquisition of practice and the capacity 
of explanation generation. Second, we discuss the dynamic of context as represented 
in contextual graphs as a movement of elements between the contextual knowledge 
and the proceduralized context, with introduction of new elements from the external 
knowledge when a new practice has to be acquired. Third, we introduce the types of 
explanation on practices and problem solving that can be generated from contextual 
graphs. We conclude with a discussion of the properties and potentialities of 
contextual graphs. 



2 Contextual Graphs 

2.1 Introduction 

The contextual-graph formalism has been developed initially for an application for 
incident solving on a subway line ([8], |ittm//wwwdm . The general 

observation is that the company establishes procedures for incident solving, and the 
operator in charge of a subway line adapts the procedure for solving an incident to the 
context in which each incident occurs. This contextualized procedure is called a 
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practice, and thus there are as many practices as different contexts encountered by the 
operator. 

In our tests operators appreciated the easy understanding of the system’s behavior 
through the use of contextual graphs in which knowledge and reasoning are used in a 
manner very close to the manner in which they solve incidents [15]. An extension of 
this project could be for the training of the future operators, thanks to the 
expressiveness of the contextual graphs, their manipulation (aggregation and 
expansion of parts of the contextual graph, etc.) and the possibility to replay some 
incident solving to study potential variants. However, the use of contextual graphs is 
not limited to the SART application, but is relevant in all domains where operators’ 
reasoning deals with the need to contextualize “official” procedures in order to 
develop efficient practices by accounting for the context in which the practice is 
elaborated. 

A contextual graph (also noted hereafter CxG) allows a context-based 
representation of a given problem solving for operational processes by taking into 
account the working environment [3]. The initial structure of a CxG (its skeleton) is 
defined by the procedure that is established by the company. The CxG is then 
progressively enriched by the practices used by operators by applying the procedure 
in different contexts. 

A path in a contextual graph represents a practice in which operator’s actions are 
intertwined with the contextual elements considered explicitly by the operator. A 
practice differs generally from another one by few actions that are discriminated by a 
contextual element that has different instantiations for the two practices. Once the 
divergence between the two practices disappears, the two practices are recombined in 
a unique path. 



2.2 Elements of a Contextual Graph 

A contextual graph is an acyclic directed graph with a unique input, a unique output, 
and a serial-parallel organization of nodes connected by oriented arcs. A node can be 
an action, a contextual node, a recombination node, or a sub-graph (an activity). 

2.2.1 Actions and Activities 

An action is an executable method. An activity is a complex action assembling 
different elements such as a contextual graph with a unique input and a unique output. 
Mechanisms of aggregation and expansion, as in conceptual graphs, allow users to 
have different views on a contextual graph and transform an activity into action. 

An activity is identified as such by operators as a recurring structure observed in 
different contextual graphs. The identification of an activity is interesting because a 
change in an activity appears automatically in all the contextual graphs where the 
activity has been identified. Activities are organized in a directed hierarchy, an 
activity possibly calling sub-activities, to maintain the status of acyclic directed graph 
to the structure. 

2.2.2 Contextual Nodes and Recombination Nodes 

A contextual element is represented by two types of node, namely a contextual node 
and a recombination node. A contextual node corresponds to the explicit instantiation 
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of the contextual element. For example, a contextual element could corresponds to be 
in a hurry with the instantiations "yes" and "no." A contextual node is represented by 
C(l, n) where n is the number of exclusive branches corresponding to known 
practices. The associated recombination node R(n, 1) corresponds to the abandon of 
the instantiation of the contextual element once the action on the branch is 
accomplished. Then, there is a convergence of the different alternatives towards the 
same action sequence to execute after. 

Thus, at the contextual node, a piece of contextual knowledge becomes instantiated 
and enters the proceduralized context. At a recombination node, that last piece entered 
in the proceduralized context goes back to the contextual knowledge. Thus, a change 
in the context correspond to the movement of a piece of contextual knowledge into 
the proceduralized context, or conversely from the proceduralized context to the 
contextual knowledge. 

Contextual and recombination nodes give to contextual graphs a general structure 
of spindle or series of spindles, with a divergence of branches at contextual nodes 
initiated by a diagnosis, and a convergence at recombination nodes, thanks to actions 
or activities realized. 

2.2.3 Sub-graphs 

A sub-graph represents a local reasoning (a diagnosis/action structure) corresponding 
to intermediate goals. A sub-graph can be an action, a sequence of actions, or a pair of 
contextual and recombination nodes. A sub-graph is itself a contextual graph, 
directed, acyclic, with one input and one output. If a sub-graph contains on a branch a 
contextual node, it contains necessarily its recombination node on the same branch. 
Conversely, if a subgraph is on a branch, it contains at most all the items on the 
branch. 

2.2.4 Parallel Action Grouping 

A parallel action grouping represents a set of m steps in a problem solving that can be 
realized in parallel or in any order but all must be accomplished before to continue. 
For example, a coffee preparation requires to take coffee, filter and the reservoir, 
these actions can be executed in any order but must be accomplished before to switch 
on the machine, the order in which these three actions must be executed does not 
matter. The activity is judged globally with respect to a high-level goal. For example, 
the type of coffee machine generally does not appear explicitly in the example of the 
coffee preparation, when it would aloow to order the previous actions (e.g. if the place 
where to put the filter is fix on the coffee machine). The ordering of the actions to 
execute in a parallel action grouping depends on contextual elements that does not 
appear in the contextual graph because they are not at the same level of description 
and constitutes a dense net of contextual nodes leading to few solutions (see [4] for a 
discussion on this point). This is a way to deal with the incompleteness or complexity 
of the local information. 



2.2.5 An Example 

Figure 1 gives an example of contextual graph. An action is represented by a square 
box. A contextual node is represented by a large circle and Cj.k is the instance k of 
the contextual node Cj (1, n). A recombination node Rj is represented by a small 
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black circle. (Subgraph and parallel action grouping are not represented and discussed 
in this example.) 




Fig. 1. An example of contextual graph 

The operator provides a practice as a sequence of actions such as { Al, A2, A3, A5, 
A9}. The corresponding path is given by the sequence of actions intertwined with 
contextual and recombination nodes as {Al, A2, Cl.l, C2.1, C3.1, A3, R3, A5, R2, 
Rl, A9} on the upper path in Figure 1. A sub-graph can be an action (e.g. A3), a 
sequence of actions (e.g. A1-A2), a pair of contextual and recombination nodes and 
all the items between them (e.g. C3-A3/A4-R3), all the branches between a contextual 
node and recombination node (e.g. the upper branch of C2 for the value C2.1 with 
C3-A3/A4-R3-A5). 



2.3 Practical Aspects: Implementation 

We developed a software for exploiting the formalism of contextual graph. This 
implementation is realized actually as a prototype written in Java, with a storage of all 
the data in a database. It presents usual functionality as: switching between different 
language (French and English at any moment of a session), identification of the user 
(two types of users, namely the “super-user” who can create a new graph and the 
“user” who can only enrich the graph with new practices), enrichment and correction 
of all texts (immediately visible), an online help, different types of visualization 
(graph resizable according to the dimensions of the window, aggregation and 
expansion of parts of the graph), comparison of graphs, coloring sub-graph (e.g. for 
identifying an activity found in different contextual graphs), visualization of the 
growth of the graph (an addition after the previous one, or the series of additions), 
comparison of action sequences, explanation on all the items (history, contextual 
information, etc.), identification of the context of each action, etc. 
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2.4 Related Works 

The contextual graph approach presents some common points with the Ripple Down 
Rule (RDR) technique [9, 10]. The RDR technique is a hybrid case-based and rule- 
based approach in which context is an important aspect of RDR, captured in 
associated cases and the exception structure. In the RDR technique, a case can fall 
under only one rule, and an error may be corrected by adding one exception rule 
taking into account only cases that previously fell under the rule to be 
corrected. There is a two-way dependency relation between rules (with if-true and if- 
false) such that rule activation is investigated only in the context of other rule 
activation. Ripple down rules form a binary decision tree that differs from standard 
decision trees in that compound clauses are used to determine branching, and these 
clauses need not exhaustively cover all cases so that it is possible for a decision to be 
reached at an interior node. The RDR technique relies on the fact that people cope 
with the acquisition and maintenance of complex knowledge structures by making 
incremental changes to them within a well-defined context such as the effect of 
changes is locally contained in a well-defined manner [11]. Thus the knowledge that 
is introduced is highly contextualized. The recommendation given by an expert 
depends on the context in which it is given and does not consist of a description of the 
expert’s thought processes but is a justification of why this recommendation was 
made. 

Gonzalez and Ahlers [12] describe a knowledge representation paradigm to model 
the intelligent behavior of simulated agents in a simulator-based tactical trainer. Their 
hypothesis is that tactical knowledge is highly dependent upon the context (i.e. the 
situation being faced) and proposed a system called context-based reasoning (CxBR). 
CxBR encapsulates knowledge about appropriate actions and/or procedures, as well 
as possible new situations, into contexts. This paradigm has been tested in an 
application for submarine tactical officers on a patrol mission. Tactical knowledge is 
required in order to endow autonomous intelligent agents with the ability to act, not 
only intelligently, but also realistically, in light of a trainee’s action. Gonzalez and 
Ahlers’ work is based on the idea that by associating the possible situations and 
corresponding actions to specific contexts, the identification of a situation is 
simplified because only a subset of all possible situations are applicable under the 
active context. 

Turner [22] developed a system— an adaptive reasoner— to make context explicit for 
autonomous underwater vehicles to tackle unanticipated events in complex 
environments. Contextual knowledge is represented as a set of contextual schemas (c- 
schemas), then retrieving the most appropriate of those and using them to help the 
reasoner behave appropriately for its current context. Turner describes context- 
mediated behavior (CMB) that is based on the idea that an agent have explicit 
knowledge about contexts in which it may find itself, then use that knowledge when 
in those contexts. CMB is implemented in the Orca program, an intelligent controller 
for autonomous underwater vehicles. 
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3 Movement between Contextual Knowledge and the 
Proceduralized Context 



3.1 The Three Types of Context in a Contextual Graph 

Brezillon and Pomerol [18] proposed three types of context, namely external 
knowledge, contextual knowledge and the proceduralized context. This distinction is 
expressed in the formalism of contextual graphs in the following way. External 
knowledge is the knowledge that does not intervene in the contextual graph (i.e. 
belong to another contextual graph or does not exist in the database). This is a source 
of contextual knowledge through the incremental acquisition of new practices, as 
discussed below. Contextual knowledge exists in the CxG (the context of the CxG is 
composed of all the contextual elements in the graph) but is not considered through an 
instantiation. At the level of a practice (a given path in a contextual graph), all the 
contextual elements that are not on the path represent contextual knowledge. At the 
level of an action, contextual knowledge corresponds to contextual nodes out of the 
path where is the action, when the contextual elements belonging to the path are 
ordered in a sequence and considered through their instantiations, and thus constitute 
the proceduralized context. The proceduralized context is an ordered sequence of 
instantiated contextual elements on the path. 



3.2 Context at a Step of a Practice Execution 

The context of the contextual graph in Figure 1 is given by the elements (Cl, C2, C3, 
C4}. The context of the action A3 is composed of two parts: the contextual elements 
used on the path from the input to the action and the other elements. The later 
elements are contextual knowledge (e.g. C4). The former contextual elements are 
instantiated. Cl with the value Cl.l, C2 with the value C2.1 and C3 with the value 
C3.1. Thus, the context of the action 3 is defined by: 

The proceduralized context: (Cl with the value Cl. 2, C2 with the value C2. 1, C3 
with the value C3.1 }, supposing that the actions A1 and A2 are realized. 

The contextual knowledge: {C4} 

The context of the action A3 is described in a fixed and static way. 

Consider now the context of the path where is the action A3 Once the action A3 is 
executed, the value C3.1 of C3 does not matter anymore (i.e. at the recombination 
node R3). The contextual element C3 leaves the proceduralized context at the 
recombination node R3 and goes back to contextual knowledge. Thus, the context of 
the action A5, which follows the recombination node R3, is described by: 

The proceduralized context: |C1 with the value Cl. 2, and C2 with the value 
C2.1}, and 

- The contextual knowledge: (C3, C4). 

The context of the action A5 is also described in a fixed and static way. It differs 
from the context of action A3 by the contextual element C3 that moved from the 
proceduralized context to the contextual knowledge at the level of the practice. Thus, 
during the progress of the practice execution from action 3 to action 5, the context of 
the practice evolves when the contexts of A3 and A5 are static. 
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3.3 Context Evolution During the Progress of a Practice Execution 



The dynamics of the context appears at the practice level when the focus of attention 
moves. The contextual knowledge and the proceduralized context evolve during the 
progress of a practice execution (along a path). For example, consider the upper path 
in Figure 1: {Al, A2, A3, A5, A9}. Its context presents the following dynamic along 
the practice execution (each line of the Table represents a step in the application of 
the practice, a step corresponding to a change in the context): 



Table 1. Dynamics of the context along the upper path in Figure 1 



L 


Context from 


Contextual Knowledge 


Proceduralized context 


1 


0 


{Cl, C2, C3, C4} 


(0) 


2 


Cl 


|C2, C3, C4} 


{Cl.l) 


3 


C2 


|C3, C4} 


{C1.1,C2.1} 


4 


C3 


|C4} 


{ C1.1,C2.1,C3.1} 


5 


R3 


|C3, C4} 


{C1.1,C2.1} 


6 


R2 


|C2, C3, C4} 


{Cl.l) 


7 


R1 


(Cl, C2, C3, C4} 


{01 



The movement between the contextual knowledge and the proceduralized context 
follows the rule “last in, first out.” The contextual elements are instantiated in the 
order Cl, C2, and C3 (in the proceduralized context) and return to the contextual 
knowledge as C3, C2, and Cl. The progress of the practice execution until an item 
itself is an element of the context. Thus, two contexts having the same contextual 
knowledge and proceduralized context (as at lines 3 and 5 of the Table 1) are different 
by their history in the practice. 



3.4 Incremental Knowledge Acquisition in Contextual Graphs 

An important part of our system is the identification of a sequence of actions used by 
the operator for a problem solving. This is realized by interaction between the 
operator and the system through a graphical representation of the current state of the 
contextual graph. Once a problem is solved, the operator reports the problem solving 
by providing the system with the action sequence used for the problem solving. Then, 
the operator tells the system which known practice is the closest of the entered 
sequence of actions. The entered sequence can be a known sequence or not. This is 
determined by the system that matches actions of the sequences in an ordered way. 
Once a discrepancy is detected (an action is different, new or missing between the two 
sequences), the system ask the operator the reason of the difference. The operator 
provides the system with the missing contextual element (definition), its location 
(position of the contextual and recombination nodes on the path), its instantiations for 
the known practice and the entered practice. The contextual element that is added, 
generally comes from the external knowledge. The reason is that the instantiation of 
this contextual element was not relevant before, but is instantiated in the new practice 
in a specific way. Thus, the movement from the external knowledge to the contextual 
knowledge of a contextual graph goes through its use in a proeeduralized context. 
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This is a way to tackle the infinite dimension of the context because the external 
knowledge is considered only when needed [14]. 

Thus, contextual graphs have the capacity of evolving by accommodation and 
assimilation of practices. A new practice differs of a known practice in the CxG by 
few elements (generally an action). As a consequence, a contextual graph will possess 
more and more of practices as a kind of corporate memory. The acquisition of a new 
practice corresponds to the addition in a contextual graph of the minimum number of 
elements (generally one pair contextual node - recombination node and an action). 
Never there is to copy a large part of the contextual graph as in a decision tree [15]. 
(The complete algorithm is under study now.) 



4 Explanation Generation in Contextual Graphs 

Explanation generation was based on the domain knowledge, i.e. the task at hand, the 
actions (definition, input, output) in our case. Since about ten years it is known that 
such explanations bring few to the user and nothing to operators because of the lack 
of consideration for the context [6]. In contextual graphs, context is represented 
explicitly, the knowledge is acquired in its context of use, and thus an explanation can 
be generated from all the items in a contextual graph (contextual elements, actions, 
activities, their ordering, etc.) 

The explanation of a practice is mainly the presentation of the different contextual 
elements intervening along the path, the order in which they intervene, their 
temporary instantiations and the temporal chronology in which they have been 
incorporated in the contextual graph. The explanation of an action in a practice relies 
mainly on the proceduralized context of this action. The system can thus explain the 
reasoning hold in the practice (until this action) by presenting (1) the contextual 
elements explicitly used in the practice until the action, (2) the instantiations of these 
contextual elements, (3) the order in which are instantiated the different contextual 
elements, and, the most important, (4) the order in which (and the reasons why) the 
contextual elements have been introduced in the contextual graph. These two last 
points are a way to take into account in the explanation the context dynamics that 
leads to the action to explain. This leads to view explanation as a process in progress 
along the reasoning progress, rather than deriving it from known and static factors. 
Our position is close from Leake's position [13] about explanation in case-based 
reasoning, but the explanation generated in a contextual graph is at different levels of 
detail (the proceduralized context or the order and the reasons of the introduction of 
each contextual element). 

As the system and the user interact on the same contextual graphs, each one can 
provide the other with relevant explanation. In the case of explanations provide by the 
user, the system enters a phase of incremental acquisition of practices that will 
improve later its reasoning. In this way, the task at hand, the incremental acquisition 
and the generation of explanations must be intertwined, an important issue in 
cooperation [5]. Conversely, explanations enable the contextual knowledge to be 
proceduralized at the right place by supporting the process of incremental acquisition 
of practices. 

Moreover, the mechanisms of aggregation and expansion, as in conceptual graphs 
[20, 21], allow the user to focus on one part of the contextual graph or another 
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according to his focus of attention. For example, it is possible to study parts of a 
reasoning and all the variants (the practices). This is particularly interesting to 
understand the differences between two practices, the role plays by a contextual 
element in the choice of an action instead of another, etc. 

In conclusion, the context-based formalism called contextual graphs gives a 
uniform representation of actions series and contextual elements. Thus explanations 
are easier to produce because knowledge on which explanation relies is explicit in the 
representation. Contextual elements explains the reasons of the choice of an action on 
another action. With the history of changes in a contextual graph, it is possible to 
produce different types of explanation. 

At the level of a practice, explanation is a way to present the progress of the 
application of a practice, the movement between the contextual knowledge and the 
proceduralized context, the changes between the practice and the previous one, the 
variants added after. It is possible to generate explanations at another level to present 
contextual information as the creation date of the practice, the author, the problem 
solving requiring this change for the first time, etc. 

With the graphical interface for representing contextual graphs, the system can 
generate visual explanation on the path from the source to a given element, the ways 
in which a practice has been progressively specialized, the growth of a contextual 
graph (with the incremental addition of practice, the practices introduced by a given 
operator, etc. This aspect, thanks to the incremental practice acquisition, is a new way 
to generate explanations. 

This shows that the task at hand, the incremental acquisition of practices and 
explanation generation are three aspects of the same thing (the task at hand in the 
large). 



5 Discussion 

Context-based formalisms allow a representation of knowledge and reasoning in a 
way that is directly comprehensible by users. The structures in a contextual graph put 
at the same level actions and activities (complex action structures). Thus, two people 
having to interpret the same activity at different levels can understand each other. For 
example, “Empty the train of travelers” is interpreted as a simple action by the 
operator who is responsible of the subway line and a complex activity by the driver 
(stop at the next station, announcement to travelers to leave the train, go and check 
that nobody is still in the train, close the doors and leave the station). 

At the action level, making explicit contextual elements allows to explain the 
reasons for the choice of an action on another one. Thus, information in contextual 
graphs is useful and useable for operators. 

After using a contextual graph for a while, most of the possible practices would be 
recorded. By analyzing the whole contextual graph (the initial procedures and all the 
practices), this would allow the company to improve its procedures, and thus 
reinforce the value of operators’ practices on too general procedures. Another 
consequences of the expressiveness of the practice representation and explanation 
capability in contextual graphs is for training purpose (1) of future operators by 
discussing subtlety of the task accomplishment, and (2) of operators by exchanging 
and discussing experiences. 
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In contextual graphs, incremental acquisition of practices and explanation 
generation are naturally intertwined with the task at hand. Moreover, the progressive 
evolution of the contextual graph and its graphical representation interact 
continuously and thus can not be separated according to the rules of the software 
engineering (separation of the interface from the program). 

There are however some limits in this context-based representation at the level of 
the parallel action grouping. In the example of the coffee preparation (see [4]), we 
observe a dense net of contextual nodes for the selection of one of the three actions 
“Take reservoir”, “Take filter” and “Take coffee” that can be executed in a sequence 
that depends on factors such as the places where are the items (e.g. filters are in the 
cupboard and coffee in the refrigerator) and the user’s preferences (if I take coffee 
box in second, I have just to keep in my hand the coffee box to put it in the filter that 
will be made ready just before). The type of the machine also may intervene. For 
example, in some machines the filter is put on the reservoir, when on other machines 
it is fixed on the body of the machine. In the former case, it is necessary to first pour 
water in the reservoir, install the filter on the reservoir, and then put coffee in the 
filter. In the same spirit, it could be important to make explicit the number of persons 
interested by the coffee preparation (the choice of the machine depends on it), the 
place where are things and their relative location (place: home or at work; filter: in 
the cupboard or near the coffee machine; coffee: near the coffee machine or in the 
refrigerator; reservoir: near the coffee machine or in the "kitchen or wash room"), and 
the relationships between things. For example, is the filter on the receptacle or not? Is 
the cup a part of the receptacle itself? Must we take all the coffee machine to fill the 
reservoir with water? The availability of the resources in the operating environment 
(water, coffee, electricity source), etc. also intervene implicitly at the level of the 
parallel action grouping. All these factors constitute a set of choices that can be 
different from one day to another one (optimization of movement, of the duration of 
the operation, of the number of operations). 

Another weakness of the context-based representation concerns the representation 
of time. Now, time is represented by the fact that contextual graphs are directed, and 
that there is an ordering of the actions to execute. However, it is not possible to 
represent the fact that an action must be accomplished, say, ten minutes before 
starting the execution of another one. For example, one needs to stick successively 
two objects, the second object once the first one is definitely stuck. 

However, even now contextual graphs present some potentialities to exploit. It is 
clear that the more a system based on contextual graphs is used, the more it will 
preserve corporate memory. As a side-effect, it is possible to revise the procedures 
according to all the variants developed by operators. The new procedures would be 
more robust. As a contextual graph could describe all the ways in which something 
can be used (say, as the access to a server), it could be possible then to determine 
secure and sensible paths of access and forbid sensitive ones after to identify what a 
user is doing. We are currently studying such lines of use of contextual graphs. 
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A Deduction Theorem for Normal Modal 
Propositional Logic 



Sasa Buvac 
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Abstract. We develop a Hilbert style calculus for modal propositional 
logic which allows for the deduction theorem. Labels are used to keep 
track of both modalities that have been entered and assumptions that 
have been made. The main technical result of this paper is the equiva- 
lence of the labelled deductive calculus to the normal calculus for modal 
propositional logic. 



We assume a standard modal propositional language: a standard propositional 
language with a countable number of modalities designated by placing an integer 
in square brackets in front of a formula. By convention cj), if), and x range over 
formulae, T ranges over sets of formulae, and n,m, and i range over integers. A 
normal calculus, a staple of modal logics since [Q, is now defined in the usual 
way: 

Definition(derivation): a formula, (p, is derivable, written 

iff (p is an element of the least set which contains all the instances of the following 
axiom schemata: 

(PL) p provided p is a propositional tautology 
(K) [n]{p D pj) D [n]p D [n]p 

and is closed under the following rules of inference: 

(MP) from p and p D p infer p 
(RN) from p infer [n]p. 

Although quite elegant, a normal calculus does not allow for the deduction 
theoremlil We define a new calculus, for which we write T h: p, say that p is 
deducible from T, and prove a deduction theorem: 

^ We could, of course, extend derivability of a normal calculus to allow for assumptions: 
T h 0 if (/) is an element of the least superset of T which contains PL and K and is 
closed under MP and RN. Then, the deduction theorem, T,p\- p => T \- p Zi p, 
implies the derivability of a typically unacceptable schema 

p D [n]p 

(the converse of schema T): start with p \- p, apply RN to get p \~ [n]p, and then 
the deduction theorem. 



P. Blackburn et al. (Eds.): CONTEXT 2003, LNAI 2680, pp. 107-^32003. 
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Theorem(deduction): T, 0 h: ?/> O T h : (j) D ip . 

To ensure that our deductive calculus indeed calculates modal propositional logic 
we show its equivalence to the normal calculus: 

Theorem(normalization): \~ (p cp. 

The basic idea is to introduce labels to keep track of both modalities that have 
been entered and assumptions that have been made. 

Deflnition(label): a label is any finite sequence of formulae and modalities. 
By convention, the letters b and c range over labels. 

With this notion of label in place we now define our deductive calculus. 

Deflnition(deduction): a formula, (p, is deducible with a label, c, from as- 
sumptions, T, written 

Th c: ^ 

iff the tuple (c, (p) is an element of the least superset of e x T which contains all 
the instances of the following axiom schemata: 

(Pl) c : (p provided 0 is a propositional tautology 
(k) c : [n]{(p D Ip) D [n](p D [n]ip 

and is closed under the following rules of inference: 

(mp) from c : (p and c: (p D ip infer c : ip 
(exit) from c, [n] : (p infer c : [n](p 
(enter) from c : [n](p infer c, [n] : (p 
(assume) from c: ip D (p infer c,ip : (p 
(discharge) from c,ip : (p infer c : ip D <p. 

We write T h: <p for T h e : ^, where e is the empty sequence, and say that (p is 
deducible from T. 

1 Proofs of Theorems 

We first introduce and investigate the notion of labelling which is needed in the 
proofs of both theorems. 

Deflnition(labelling): a labelling of a formula, (p, with a label, c, written 

c □ 0 

is the formula defined inductively to be 
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Lemma(labelling) : 

1. \- (j) ^\- cZi (j) 

2. T \- b : c (f) T \- b, c (j). 



Furthermore, the following is a derived rule of the normal calculus: 

(MPc) from cZ\ 4> and c □ (0 D V') infer cZi ip. 

It is derived by induction on the structure of c. The base case, for c = e, is just 
MP. We assume the rule holds for c and show it for c, [n] first, and for c, x later. 

Case(from c, [n] □ (p and c, [n] {<p D ip) infer c, [n] □ ip): We begin by 

assuming 

c, [n] □ (<?i D Ip). 

Therefore, by definition of labelling, 

c □ [n]{(p D Ip). 

By inductive hypothesis and labelling lemma 1 applied to K we now get 

c □ {[n](p D [n]ip). 

Together with c □ [n](p (which follows from c, [n] □ 0 by definition of labelling) 
by inductive hypothesis we get 



c □ [n]ip. 

Therefore, by definition of labelling, 

c, [n] □ Ip. 



Case(from c, x □ 0 and c, x □ {(p D ip) infer c, x □ V')- Begin by assuming 

c,x □ (0 D Ip). 

Therefore, by definition of labelling, 

cDxD V’)- 

By inductive hypothesis and propositional logic we now get 



CM (xD<?i) D (XD V^)- 
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Together with c □ (x D 4>), (which follows from c, x □ 4> hy definition of la- 
belling) by inductive hypothesis we get 

c □ (x D xp). 

Therefore, by definition of labelling 



c,x □ V'- 



Proof(labelling): We first articulate and prove, in turn, two equalities which 
are needed in the proof of the lemma. The first equality, 

[n]c Z\ 4> = [n],c □ 4>, 

is proved by induction on the structure of c. The base case, for c = e, is trivial. For 
the inductive hypothesis, assume the lemma for c. We now begin with [n]c, [m] □ 
4>. By definition of labelling this is equal to [n]c □ [m]4>. By inductive hypothesis, 
this is [n], c □ [m](j), which, again by definition of labelling, is equal to [n], c, [m] □ 
(p. Next, we begin with [n]c, x □ 4>- By definition of labelling this is equal to 
[n]c □ X 3 <(». By inductive hypothesis, this is [n], c □ x 3 which, again by 
definition of labelling, is equal to [n] , c, x □ <(>■ The second equality, 

xp D cZi (p = xp,cZ\ (p, 

is proved in a similar way, by induction on the structure of c. The base case, 
for c = e, is trivial. For the inductive hypothesis, assume the lemma for c. 
We now begin with xp D c, [m] □ (p. By definition of labelling this is equal 
to xp D c [m](p. By inductive hypothesis, this is xp,cZi [m](p, which, again by 
definition of labelling, is equal to xp, c, [m] □ (p. Next, we begin with xp D c,x ^ P- 
By definition of labelling this is equal to V’ 3 c □ x 3 0. By inductive hypothesis, 
this is V’, c □ X 3 0, which, again by definition of labelling, is equal to xp,c,x^ <P- 
We now turn to the first part of the lemma. 

Case(l): Proof is by induction on the structure of c. The base case where 

c = e is trivial. For inductive hypothesis, we assume cZ\ (p. We consider two 
inductive cases. To prove h [n],c □ <p we use RN on the inductive assumption 
h c □ (/) to get h [n]c □ (p, which in turn by the first of the above equalities 
gives h [n],c □ cp. The other inductive case concerns \~ xp,c Zi (p. Assume again 
h c □ 0. Therefore by propositional logic \~ xp D c Z (p. Now the second of the 
above equalities gives \~ xp,cZ (p. 

Case(2): This proof is by induction on the structure of b. The base case where 
6 = e is again trivial. For the inductive hypothesis we assume the lemma holds 
for b, and we consider two inductive cases. In the first case we begin with 

T h 6, [n] : cZ (p. 

By exit and enter rules this is equivalent to 

T h 6 : [n]c □ (p- 
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Now, the first equality above gives 

T h 6 : [n] , c □ 0, 

and by the inductive hypothesis, we get equivalence to 

T \- b,[n],c : 

In the second case we begin with 

T h 6, r/) : c □ (/). 

By assume and discharge rules this is equivalent to 

Th6:-0DcD(/). 

Now, the second equality above gives 

T \- h ■. %p,cZ\ <t>. 

and again by the inductive hypothesis, we get equivalence to 

T \- b,'il),c ■. 4>. 



[labelling] 



We now turn to showing the deduction theorem. 

Proof(deduction): We prove each direction in turn, starting with the shorter 
proof. 

Case(if): This direction is just mp observing the following two structural rules: 
and 

T h: X ^ T,0 h: X- 
We let X in the latter he (f> D ip to get 

Th: (j)D Ip ^ T,(p\~: (p D xp. 

The left hand side is the premise; the right hand side together with former 
structural rule T,(p\-: <p via mp gives 

T,p>h-.xp. 



Case (only if): We prove the equivalent form 

T,(p\- c : xp T h (p,c : xp. 
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(The only if direction of the deduction theorem follows from the above with 
c = e via one application of discharge.) The proof is by induction on the 
structure of the deduction of c : i/'- 



Case(base with (j> ^ cZ\ c: xj) must be an instance of an axiom, in which 
case <p,c : Ip is too. 



Case(base with (p = c Zi ip): We begin with an instance of the propositional 
tautology 

T^-.(PD(P. 



By assume we get 
Since (p = cZ ip this is 



T\- <P:<p. 



T'r (p \ cZip. 

Therefore, by the labelling lemma 2, 



T \- (p,c : Ip. 



Case(mp): Assume 

T,(p \- c : Ip => T \- (p,c : Ip 

T,(p\- c : Ip Z X ^ T i- (p,c : Ip Z X 

and that T,(p h c : x had been deduced from the left hand sides of the two 
inductive hypotheses above via mp. Then T h (p,c : x can be deduced from the 
right hand sides of the hypotheses also via mp. 

Case(exit): Assume 



T,(p\- c,[n] : Ip T \- (p,c, [n] : ip 

and that T,(p\~ c : [n]ip had been deduced from the left hand side of the inductive 
hypothesis above via exit. Then T h 0, c : [n]ip can be deduced from the right 
hand sides of the inductive hypothesis also via exit. 

The proof is similar for other rules of inference. [deduction] 

To prove normalization we need to manipulate derivations and deductions 
as objects; to this end we introduce the notion of a formula tree: a tree whose 
every node contains a formula. By convention A, B, and C range over formula 
trees. We establish the following notation for formula trees: 

means that A is a formula tree with root (p 
A<P 

means that A is a formula tree with a single node, and that node is <p 
means that A is a formula tree with root <p and exactly one branch: the 
formula tree B 

means that A is a formula tree with root cp and exactly two branches: the 
formula trees B and C. 
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To illustrate our notation we observe that implies one of the following: A‘^, 
or that there exists a B such that A^, or that there exist B and C such that 
ab c 

■ 

We extend the above meanings of superscripts and subscripts on formula 
trees, to trees whose every node contains a (c, 4>) pair, only in such cases we use 
a, /3, and 7 instead of A, B, and C; 

Proof(normalization): We show that every derivation can be transformed 

into a deduction and vice versa. 

When transforming a derivation into a deduction we need to keep track of 
all the applications of RN starting from the leaf nodes. 



Deflnition(@): 






= a 



c:[n]4> 



A^ me = 



Lemma(deduction construction): if A^ is a derivation then ac-.^, = A@c is 
a deduction, for any c. 

Intuitively, to transform a deduction into a derivation we simply replace every 
: with a □. 

Deflnition([[ • ]]): 




Lemma(derivation construction): if ac-.tp is a deduction then = [[a]] 

is a derivation. [normalization] 

All that remains to be proved are the two construction lemmas. 

Proof(deduction construction): Construction is by induction on the struc- 
ture of the derivation. The base case is trivial, as every axiom of the normal 
calculus is an axiom with any label of the deductive calculus. 

Case(MP): Assume that 

1. if B^ is a derivation then j3c:^ = R@c is a deduction 

2. if is a derivation then 7 c:^d 0 = Cmc is a deduction 

3. ^ derivation. 
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Following the definition of @ we let 

Since is a derivation, then so are both and therefore, by 

inductive hypothesis we get that /?c:^ = B@c and 7 c:v>d 0 = C@c. Thus 

7c:,/,30 ^ AB 

Furthermore, by inductive hypothesis we get that /?c:^ and 7 c:^d<^ are both de- 
ductions; therefore by mp ^jg^ ^ deduction, which completes 

this case as the choice of c was arbitrary. 



Case(RN): Assume that 

1. if B^ is a derivation then /3c,[n]:y> = B@c, [n] is a deduction 

2. Af A is a derivation. 

Following the definition of @ we let 






c.H ^ Af^.Mc. 



n]<j> 






Since is a derivation, then so is B^; therefore, by inductive hypothesis we 
get that $c,[n]-.ip = B@c, [n]. Thus 



l]<l> 



a; 



n]<p 



Furthermore, by inductive hypothesis we get that /9c,[n]:i/) is a deduction; therefore 
by exit must also be a deduction, which completes this case as the choice 

of c was arbitrary. [deduction construction] 



Proof(derivation construction): Construction is by induction on the struc- 
ture of the deduction. 



Case(base): Assume that is a deduction. Then c : 0 is an instance of pi or 
k. Therefore, 4> is an instance of PL or K, and thus we get h 4>. Now labelling 
lemma 1 gives \~ cZi 4>. 

Case(mp): Assume that 

1. if /3c:0 is a deduction then Bczi<t, ~ [[/^]] ^ derivation 

2. if is a deduction then Cc-zup^ii = [[7]] is a derivation 

3. ^ deduction. 
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Following the definition of [[ • ]] we let 



A 



[[/3]1 [[7]] 




The last assumption implies that both (5 and 7 are deductions and the first two 
assumptions thus yield that i?czi</i = [[/?]] and Cczi,/,d^ = [[7]]. Therefore 



From the same combination of assumptions we also conclude that Bc-34, and 
both derivations, and therefore by MPc so is A^^* ^ 



Case(exit): Assume that 

1. if (ic,[n]-.<i, is a deduction then i?c,[n]zi0 = [[/ 3 ]] is a derivation 

2. is a deduction. 

c:[n\(p 

The latter assumption implies that /3 is a deduction, and the former assumption 
thus yields that [[/?]] = i?c,[n]zi0 is a derivation. We let A = [[a]] and then, by 
the definition of [[•]], A = Sc,[n]D0 is a derivation. Therefore, by definition of 
labelling, so is 

The other 3 cases are similar. [derivation construction] 



2 Related Works 

Combining the actions of entering and exiting with natural deduction style in- 
ference (assuming and discharging) was proposed, defined, and utilized for AI 
examples in |E]. However, the latter paper assumed but never proved the deduc- 
tion theorem. Our deductive calculus is a quantifier free version of Pj aimed at 
showing the deduction theorem. We have done this by distinguishing between 
modalities and labels, both of which had been grouped under the single category 
of context in 

Addin^abels to deductive systems has been studied by many authors in the 
past (see [Bf); however, comparisons to all the resulting logics are beyond the 
scope of this paper. 
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Abstract. This note describes three formalized logics of context 
and their mathematical inter-relationships. It also proposes a Natural 
Deduction formulation for a constructive logic of contexts, which is what 
the described logics have in common. 
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1 Introduction 

The word “context” has too many different meanings, so we should start by ex- 
plaining that we are interested in logics of context designed to help automated 
reasoning in Artificial Intelligence (AI) , more specifically, in knowledge represen- 
tation. Thus we are interested in mathematically understanding and clarifying 
work that, starting with McCarthy’s seminal papers[McC96,McC93,McCB97], 
aims at giving the (informal) notion of context the role of a first-class object in 
a logical system. 

Our goal is a mathematically well-behaved logical system that models rea- 
soning that happens when we say, for example, that in the context of Sherlock 
Holmes stories it is true that Sherlock Holmes lives in Baker Street, London. For 
a traditional mathematical logician, this informal notion of context is modeled 
by considering different logical theories and the burden of deciding how these log- 
ical theories interact is shifted to the metalogic and the human reasoner. In this 
paper we take for granted that the reader has been convinced by McCarthy’s, 
Giunchiglia’s (and others’) arguments that context should be a first-class object 
in a logical system and that the question to be solved is which logical system 
should one use. Narrowing our focus, we concentrate not in deciding which 
logical system to use, but on the much smaller question of comparing, in terms 
of their mathematical properties, the systems^ in the literature where context 
is modeled via a modality operator, usually written as ist(fc. A). Here the ba- 
sic intuition is that formulas, such as A, are true not in absolute terms, but in 
certain contexts, in particular, in the context named by the constant k. There 

^ A referee has rightly complained that we do not discuss how well these systems match 
the intuitions they are trying to model. While this task seems very important, this 
author does not have the right intuitions to carry it out. Moreover, the project 
[CC-f02] that our theoretical investigation underpins has moved to a new direction. 



P. Blackburn et al. (Eds.): CONTEXT 2003, LNAI 2680, pp. 116-129, 2003. 
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are many reasons why this is a good idea for AI and these, as well as examples 
of the applications of these ideas, are discussed in the literature. But even nar- 
rowing down the problem to choosing between systems and considering only the 
systems based on some kind of modality, the task is daunting. The literature on 
notions of context and on formalizations thereoff, i.e. logics of context is really 
vast[AS96]. This paper discusses three propositional^ systems: Buvac and Ma- 
son’s propositional logic of contexts, henceforth PLC [BBM95], Nayak’s system 
(here called AT) a logic of contexts for multiple domain theories [Nay94]and Mas- 
sacci’s system T [Mas95], described as a tableaux version of PLC. The Trento 
group framework for logics of context, called LMS/MCS, for Local Model Seman- 
tics/MultiContext systems [BS00,SG00] was also originally considered, but that 
comparison is now in a companion paper[deP]. This is because, strictly speaking, 
MCS/LMS has no explicit modality. However, it is well-known that the bridge 
rules of their main system MR correspond, technically, to a K necessity operator. 

In the next section we discuss why worry about Natural Deduction, why 
constructivity is important for us, what constitutes a Natural Deduction formu- 
lation of a logic and why obtaining a Natural Deduction formulation for logics 
of context is problematic and worthwhile. Then, in the following sections, we 
give succint descriptions of the systems of contexts we consider. After that we 
compare and evaluate those systems. The upshot is that we can produce a very 
stringent Natural Deduction formulation for what these systems have in com- 
mon. The natural deduction formulation for this core constructive language is 
spelled out in detail in the following section. 

2 Natural Deduction: Why? 

McCarthy, when first discussing the idea of contexts in AI, suggested that a 
“strong form of Natural Deduction” should hold for an intuitively appealing 
logic of contexts. His suggestion of a logic of contexts is based on the notion of 
a modality ist(K, A). The intuition of using a modality operator to deal with 
logics of context is common to all the systems we discuss (and many others we 
do not). But the systems differ along three different dimensions. First they differ 
on which properties the modalities are supposed to have, then they differ on how 
they are described mathematically, e.g. whether one uses axioms or tableaux 
systems or Natural Deduction rules and finally they differ on which properties 
do they prove of the system they consider, whether they have soundness and 
completeness and with respect to what kind of model. 

We advocate the view that a logic should be independent of its different 
presentations, that is, that one should be able to give different presentations 
(using axioms, sequents, rules) for any decent logic, as we can do for e.g. classical 
or constructive first-order logic. Moreover, since these formalizations are only 
different presentations of the same logic, we believe that one must be able to 
prove them all equivalent, using syntactic translations between the systems. Thus 
our first aim is to prove that there is a decent logic of contexts, that is, there is 

^ There are first-order systems in the literature, but we restrict our attention to propo- 
sitional systems. 
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a formal logic of contexts which can be given in several different presentations, 
all proved equivalent. 

McCarthy’s and Cuba’s intuitions were formalized by S. Buvac, V. Buvac 
and I. Mason in[BBM95]. Their formalization was done in a Hilbert-style system, 
usually the easiest kind of formalism as far as modal logics are concerned. That 
paper leaves open how to formalize their propositional logic of contexts in a 
Natural Deduction setting. Actually, when discussing future work they say: 

We also plan to define non-Hilbert style formal systems for context. 
Probably the most relevant is a natural deduction system, which would 
be in line with McCarthy’s original proposal of treating contextual rea- 
soning as a strong version of natural deduction. In such a system, entering 
a context would correspond to making an assumption in natural deduc- 
tion, while exiting a context corresponds to discharging an assumption. 

But this future work has not, as yet, come to fruition, which is not surprising, 
considering the amount of controversy surrounding Natural Deduction for Modal 
Logics in general. For some of this controversy (and a detailed explanation) the 
reader is directed to [BdPROl]. 

A formal description of what constitutes a Natural Deduction formalism will 
not be attempted here, but we take as paradigmatic the work of Prawitz[Pra65], 
which is sometimes described as Gentzen-style Natural Deduction, by contrast 
to Fitch-style Natural Deduction. Gentzen-style Natural Deduction derivations 
are tree-shaped, usually with one introduction and one elimination rule for each 
logical connective. More importantly, the introduction and elimination rules give 
rise to a notion of normalization (elimination of the ‘detour’ in the proof, that 
consists of one introduction rule followed immediately by the elimination of the 
same connective). For intuitionistic logic this paradigm works very well, both 
for first-order and for higher-order calculi. For other logics, especially modal log- 
ics, the formalism does not work so well. Prawitz, for example, only deals with 
the systems called S4 and S5 in his treatise and even that treatment is not opti- 
mal [BdPOO]. In a nutshell, the problem is that it is hard^ to provide introduction 
and elimination rules for a K-style necessity (□) operator that satisfies only the 
sequent calculus (Scott’s) rule: 



rh H 
□r h uB 

For a start, this rule is clearly both an introduction and an elimination rule. But 
the crux of the problem is how to write, using a tree-like derivation that, after 
the use of the necessitation rule, all the premises become boxed. Proof-theoretic 
trees only grow downwards, not upwards. If instead of usual Prawitz-style trees, 
one tries to use Natural Deduction in sequent-style, as advocated by Martin- 
Loef (which corresponds to writing the rule as above) the problem persists. One 
essential component of Natural Deduction is its ability to put proofs together. If 
you have proofs tt: Ai, . . . , Ak b B and a: C h Ai, you must be able to compose 

^ So hard that Bull and Segerberg in [BS84] discuss whether modal logic is not natural 
enough to have a Natural Deduction formulation. 
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them in ND, obtaining cr; tt a proof oi C,A 2 , ... At b B. But if you apply the 
Box rule to tt, obtaining OAi, . . . , nAj. h OB, then you cannot compose it with 
cr anymore. This is an unfortunate situation and there are several very different 
solutions to this problem in the literature. Most of the solutions build-in some 
of the semantics into the syntax of modal logic: this is the case for Gabbay’s 
labelled deductive systems, Simpson’s framework and Basin et al’s framework. 
The solution we prefer is merely syntactic, see section 6, but there are tradeoffs, 
discussed later. 

The (proof theoretic) received wisdom about logical formalisms is that: 

— Axiomatic systems are the easiest ones to devise and also the ones where it 
is easier to prove theorems about the system; 

— Sequent calculi are the systems that are easy to mechanize and 

— Natural Deduction systems are the ones most similar to the way humans 
construct proofs. 

It is also the case that given a Gentzen-style Natural Deduction system one can, 
automatically derive both sequent calculus and axiomatic systems from it, but 
the converses are not always true. Hence Natural Deduction systems are the 
most informative formalism. But exactly what constitutes a Natural Deduction 
system and, given that modal logics must depart somehow from the traditional 
setting, what are the most important properties to preserve is subject to personal 
taste and warrants discussion. 

Given that sequent calculi (and tableaux systems) are, arguably, better for- 
malisms for automatic proof search, whereas Natural Deduction comes into its 
own when dealing with proof normalization, one may wonder why we worry 
about a Natural Deduction version of a constructive logic of contexts. In the 
one hand, we are interested in deep understanding of the logic in question and 
a Natural Deduction formalization gives the ability to change formalisms as ex- 
plained above. Since the different formalisms do not constitute different systems, 
but are simply different presentations of a given system, a Natural Deduction 
presentation, together with its translations, affords logical respectability. On the 
other hand, our emphasis on constructivity of the logic explains an ulterior (and 
eventual) goal: we would like to use the Gurry-Howard correspondence to pro- 
vide a functional programming language for dealing with proofs of statements 
in context. 

But even discounting the motivation of a Gurry-Howard system for contexts, 
it is true that the exercise of comparing logics tends to clarify our understanding. 
This explains the emphasis on the comparison of the systems in this paper. Both 
Buvac, Buvac and Mason’s PLC and Nayak’s N are given as axiomatic systems, 
while Massacci’s calculus is given as a tableaux system - a close cousin of a 
sequent calculus. Thus we start by describing PLC and TV and then we discuss 
Massacci’s system. After that we introduce our own Natural Deduction system. 



3 The Propositional Logic of Contexts PLC 

Buvac, Buvac and Mason’s paper “Metamathematics of Gontexts” [BBM95] is 
the most developed formalization of McGarthy’s ideas [McG93] about a propo- 
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sitional logic of contexts. Their propositional logic of contexts extends classical 
propositional logic in (at least) two ways: first, it adds a new modality ist(«:, (j)), 
used to express that the sentence 4 > holds or is true in context k. Second, they 
postulate that each context has its own vocabulary, ie a set of propositional 
atoms that is meaningful in that context. They describe a basic logic of con- 
texts, describe a semantics for this basic logic similar to the traditional seman- 
tics for first-order logic, discuss various extensions of this basic logic and give a 
correspondence theory, relating axioms to extensions of the basic semantics. 

We start with two given, distinct, countably infinite sets JC, a set of labels 
(intuitively denoting some basic contexts) and V the set of all propositional 
atoms. Then well-formed formulas T are built from the sets JC and V by negation 
and implication, together with the ist{K,(j)) operator. 

T--VU ist(/C, V) 



Instead of using simply the set JC of basic labels PLC uses the set of finite 
sequences over JC, JC* . A context, denoted k, consists of a finite sequence (ki...«:„) 
of elements of JC (or in the degenerate case e, the empty sequence). But when 
one writes ist( 7 c, 2I) this actually means ist(Ki, (ist(K2, . . . ist(K„, A) . . .))). 
This use of sequences of basic contexts corresponds to PLC’s intuition that what 
holds in a context depends on how you arrived at this context, so that K1K2 
represents how context is seen from context K2- 

We also need to explain the role of vocabularies. The intuitive idea is that a 
vocabulary (the set of meaningful propositional atoms) is defined for each con- 
text. Thus we have a relation Vocab between JC* and V. The notion of derivability 
(Pk A) that defines PLC is also dependent on the vocabulary used, so it should 
be written as but PLC makes the simplifying assumption that given any 

formula A and context k we can calculate the vocabulary of the formula A in 
context k using a function V ocab{K, A). Moreover, PLC’s Definedness Condition 
asserts that whenever we state A, we implicitly assume that the Vocab{n, A) 
is contained in (the previously given and forever fixed) Vocab. 

Buvac, Buvac and Mason assume the following axioms: 

(taut) hjf A for all classical tautologies A 



(K) \-jr ist(K, A ^ B) ^ (ist(K, A) ist(«;, B)) 

(A) \-jr ist(«, ist(Ki, A)\/ B) ^ ist(fc, ist(Ki, 2I)) V ist(«:, B) 
together with the proof rules of context switching {CS) and Modus Ponens (MP) 
below. 



(CS) 



A 



{Mpy 






Pk a 



Pj. ist(Ki,yl) ' ' hj.B 

The axioms^ and rules above constitute the Hilbert-style system for PLC. 
Note that derivations are always in context, i.e. the turnstile is always decorated 
with the context sequence where the derivation occurs. We say A is provable in 



The axiom (taut), valid for all systems considered in this note, is disputed by a 
referee, who suggests that truth in a context should be constrained by relevance to 
a context. But relevance is a much harder problem than localization of truth, which 
is the simplified aim of these logics of context. 




Natural Deduction and Context as (Constructive) Modality 121 



context K iS A is an instance of an axiom schema or follows from provable 
formulae by one of the inference rules. 

The axiom schemas (taut) and (K) are traditional, in that logics with modal- 
ities usually satisfy all tautologies of the basic (in their case classical) logic and 
the axiom K is generally considered the bare minimum to require of a modality. 
The Modus Ponens rule (MP) is also traditional, but adapted to hold in each 
and every context k. 

The context switching rule (CS) and the axiom (Z\) deserve some discussion. 
It is easy to see that the context switching rule is more general than the usual 
modal necessitation rule. If one erases contexts from the derivability relation the 
context switching rule becomes the necessitation rule. But it is not immediately 
clear that whenever one uses the context switching rule in a PLC proof, the 
modal necessitation rule could have been used instead. 

Let us call localized multimodal K, the system consisting of two axiom 
schemas: 

(taut) Lk a for all classical tautologies A 

(K) Lk ist(K, A ^ B) ^ (ist(K, A) ist(K, B)) 



together with rules 
'tk A 

{Nec*) 

Lk ist(Ki, A) 



(MP)- 









A 



Proposition 1 (Seraflni) Assume that all contexts have the same vocabulary. 
Given a proof n of A in PLC, there exists a proof n' of A in the system localized 
multimodal K plus A. 

Proof: Consider the first appearance of the context switching rule in tt. Assume 
it uses Pkkj a to give \~k ist(Ki, A). The proof till this use of context switching 
{CS) was all done in the context kki. Since all axioms in the context ~kki are 
also axioms in k and whatever uses of (MP) in kki are also uses in k we can 
remove ki from the whole proof and after this transformation the proof looks 
like 



A 

\--fr ist(Kl, A) 



Applying this transformation to all occurrences of the context switching rule, we 
obtain a proof that only uses the localized necessitation rule.D. 

The reader will have noticed the assumption of all contexts having the same 
vocabulary. Recent work[BS00] of Bouquet and Serafini’s shows semantically 
that the vocabularies of PLC play no essential logical role. They say that their 
“Reduction to Complete Vocabularies” theorem allows them to conclude that 
PLC really is the normal multimodal logic K extended with the extra axiom A. 

The axiom (A) is problematic from the proof-theoretic perspective. Buvac, 
Buvac and Mason say that axiom A corresponds to the validity reading of the 
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modality ist(K,A). They justify their adoption of axiom (Z\) (for their most 
generic logic of contexts) by saying that if we disregard vocabulary restrictions 
then A can be written as 

(A') ist(Ki, ist(K2, ^)) V ist(Ki, -iist(K2, ^)) 

which they read as saying that “it is true in knowledge base Ki that A is valid in 
the knowledge base K 2 , or it is true in knowledge base ki that A is not valid in 
the K 2 knowledge base” . Thus each knowledge base behaves as if it can see into 
another knowledge base and decide for any formula A whether or not it is valid 
in the second knowledge base. But it is not clear that this kind of property is 
essential (or even sensible) for a basic logic of contexts. Actually [CP98] states: 
“This axiom [A] does not seem justified, even for the applications they consider. 
There is no reason why a database should have complete information about the 
contents of other databases.” 



3.1 Evaluating PLC 

Buvac and Mason say that 

Modelling truth or validity in a context by a Kripke model, ie by a 
relation between worlds would not be intuitive, because we want contexts 
to be reified as first class objects in the semantics. This will allow us (in 
the predicate case) to state relations between contexts, define operations 
on contexts and specify how sentences from one context can be lifted 
into another contexts. 

But PLC is a propositional logic and its extension to the 1st order case is far 
from trivial. Also in the context of PLC no relations, nor operations between 
contexts are specified. Thus the only reason given by Buvac and Mason for not 
considering a Kripke-style semantics, that “it is not intuitive to model validity 
in a context by a relation between worlds” seems too vague. A matter of taste, 
like saying that you should always use first-order logic, if you can. 

It is satisfying to have a sound and complete (first-order-like) semantics for 
PLC, and for some of its reasonable extensions, but it is not clear how much 
the semantics presented forces one to accept axiom Z\®. It is also not clear to 
me, why such a first-order-like semantics is or would be better than a possible- 
worlds semantics. Thirdly the role of vocabularies and whether one should have 
contexts modelled as sequences of basic contexts (or not) is still unclear. 

Finally note that to consider a constructive version of PLC we need to take 
as basis any axiomatization of constructive logic and if we decide that the axiom 
(A) is not required, we just keep (C5), (MP) and K, nothing more needs to be 
done. 

® This is actually an usual problem with any axiomatic system, it is always the case 
that other axioms might be better, less redundant or more informative. This is 
another reason for considering other formalisms for a “minimal” logic of contexts. 
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4 The Logic of Contexts for Multiple Theories M 

Nayak [Nay94] takes a different view of the problem of devising a useful logic of 
contexts: he suggests that, for the purposes of representing and reasoning with 
multiple domain theories, rather than developing new syntax and new semantics 
for a logic, we can simply stick with a natural (multimodal) extension of a 
traditional modal logic. Nayak suggests to write a necessity modal operator for 
each context (contexts are simply labeled by natural numbers) and to allow 
different contexts to have different vocabularies. 

Nayak presents two main reasons for treating contexts as modal operators, 
instead of extended terms, as in PLC. First, he says, in the propositional case the 
context operators and terms are effectively equivalent. Second, the advantage of 
contexts as terms is that it allows reasoning about contexts within the logic, 
but, he contends, most of the reasoning he wants to do about contexts and 
about relations between contexts can be done in a meta-theory. Hence it should 
be worthwhile investigating the properties of a simpler logic of contexts. 

The syntax of Nayak’s logic of contexts has a set of propositions V, as before, 
as contexts JC it has natural numbers {1, 2, 3, . . . , n, . . .}, and instead of ist(f, A) 
for A in V, Nayak denotes that formula A is valid in a context i, by an indexed 
necessity operator Ci{A). To faciliate the comparison we will use PLC’s notation 
instead. Well-formed formulae are given by 

J- := V U U {F — > F) U ist(i, F), i G JC 

Because Nayak’s logic wants to pay attention to different vocabularies for dif- 
ferent contexts, it defines a function vocabulary, which maps contexts to the 
collection of propositions defined for that context, voc.JC 2^. Since some 
propositions are not part of the vocabulary of some contexts, we say that a well- 
formed formula A is meaningful with respect to voc if for any propositional letter 
p occurring in A, if p is immediately within a context ist(i, ) then p must be in 
the vocabulary of that context. 

Nayak assumes the following axioms: 

(Al) h A for all (classical) meaningful tautologies A 

(A2) h ist(i, {A ^ B)) ^ (ist(i, A) ist(i, B)), for 1 < z < n 

(where all formulae in axiom A2 are assumed meaningful) together with the 
proof rules of Necessitation and Modus Ponens below, where ist(i. A) is assumed 
meaningful. 

PA h A^ B PA 

(Nee) (MP) 

P ist(z. A) h B 

In a nutshell Nayak proposes using a normal multimodal system K as the ba- 
sic logic, but goes on to say that this axiomatization does not restrict enough the 
properties of contexts or their inter-relationships. For the purpose of modelling 
these extra properties, he introduces three new axioms: 

(A3) ist(K, A) ^ ist(Ki, ist(K, A)) 

(A4) -iist(K, A) ^ ist(Ki, -iist(«:, A)) 

(A5) ist(K, A) ^ -iist(K, -lA) 
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The system consisting of multimodal K together with axioms A3, A4, A5 
(called J-n in Nayak’s work) is called here N , for Nayak. Note that axiom A5 is a 
generalization of modal axiom D, ie it is D for every context operator, discussed 
in the extensions of PLC. Axioms A3 and A4 are the generalizations of positive 
introspection and negative introspection that appear in converse form in other 
extensions of PLC. Nayak’s logic N is greatly simplified, it does not need to deal 
with sequences of contexts and these generalizations “ensure that every context 
knows about what every other context does and does not know, i.e. the facts 
true in a context are context independent” . 

Making Nayak’s system constructive is a matter of making the propositional 
basis constructive and the basic modal operators constructive. Thus it is clear 
that it depends on deciding which shape of constructive modal logic one prefers. 



4.1 Evaluating Af 

There is much to recommend the use of ‘off the shelf’ logical systems. But it 
must be pointed out that the draconian simplifications brought about, especially 
by axioms A4 and A5 make Nayak’s theory applicable only to situations where 
the contexts are almost not related at all, as in his example of Saturn’s moon 
Titan and tropical forests. 

The simplifications brought about by the extra axioms seem too strong for 
a minimal logic of contexts. Having said that, it would be good to have Nayak’s 
system at one end of the spectrum of useful context logics. One problem is 
providing a natural deduction formulation for axioms A4 and A5. 

5 Massacci’s Tableaux System 

Massacci’s papers [Mas95,Mas96] deal with a tableaux version of a logic of con- 
texts. Massacci seems to be referring to PLC, as defined by Buvac, Buvac and 
Mason, but as we will discuss his logic proves more theorems than basic PLC. 
To describe the system Massacci calls T (for tableaux) we start with two dis- 
tinct countably infinite sets JC and V, the set of all basic contexts and the set 
of all propositional atoms. Then well- formed formulas T are built from the sets 
K. and V by negation and implication, together with the ist(«. A) operator. As 
in PLC, contexts are actually sequence of basic contexts and contexts determine 
the vocabulary of an application or theory. The vocabulary is as before described 
by a function vocab: K.* 2^ assigning to each context sequence k a subset of 

the basic propositions that are supposed to be meaningful in that context. 

But instead of axioms, Massacci introduces tableaux rules, together with a 
semantics in terms of “superficial valuations” . Massacci’s tableaux system uses 
formulae with labels and labelled deduction rules. The labels on the formulae 
have a double role: given a contextualized formula (7c[n]: A), k is a sequence of 
basic contexts, n is an integer and A is a well-formed formula as above. Intuitively 
the prefix K[n] ‘names’ the n-th superficial valuation, where A holds. 

The first three rules correspond to the propositional classical basis and are 
standard for tableaux systems, except that they carry annotations telling you in 
which context /world you are working. 




Natural Deduction and Context as (Constructive) Modality 125 



K\n\:AhB K\n\. ^{Ay B) K\n\:^^A 

(&) (-/V) ^ — {^^)— 

K[n\. A, K[n]-. B K[n]-.^A \ K[n]-.^B K[n]-.A 

The next two rules, called “databases rules” require some explanation. The 
local contextual database LB is a set of formulae holding in the initial context 
kq. The global contextual database GB contains the formulae holding in every 
context sequence kq extending the initial sequence kq. These rules are necessary 
to deal with logical consequence in modal logics, but are not related to the 
essence of contexts. 

(Loc) ’ If A is in LB, where kq is the initial context 
kq[1]\A 

(Glob) ’ If A is in GB, k is present and extends the initial context 
K[n]: A 

The last two rules, positive and negative lifting deal with the essence of con- 
texts. They somehow reproduce the effects of the modal axiom K and of the 
necessitation rule, plus the effect of the extra axiom A. 



7c[n]: ist(K, yl) 

(P-lift) If KK[m] is present in the branch 

KK[m]: A 

7c[n]: -iist(K, yl) 

(N-lift) If KK[m] is new for the branch 

KK[m]:^A 

Massacci shows that the axiom (A) is derivable in his tableaux system, but 
does not prove syntactic equivalence between the systems PLC and T : ideally we 
should like a theorem like bpLc A iff hr A. To obtain such a theorem we need to 
show how to derive the rules of positive and negative lifting, using the axioms 
and rules of PLC. 



5.1 Evaluating Massacci’s Systems 

Massacci claims two main advantages for his system: Firstly that the rules reflect 
“epistemic properties (lifting, use of assumptions, etc)”. This seems too subjec- 
tive. But secondly he proves computational properties: the system allows for local 
and incremental computation, satisfies strong confluence and can be adapted to 
different search heuristics. These advantages are clear, usually tableaux calculi 
are better for proof search than axiomatic systems. Also his kind of tableaux 
were devised for efficient automated theorem proving, which is always useful. 
Hence it would seem a good idea to constructivize T and to try to prove the 
conjecture above that h p^c A iff hp A But I do not see how to mimick the posi- 
tive and negative lifting rules of T using PLC’s axioms and rules, and I guess the 
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proof that this works goes via the semantics. Since Massacci’s work builds on 
his adaptation [Mas94] of Fitting’s work on prefixed tableaux, this whole theory 
needs to be used, which is very unsatisfactory. 



6 Our Natural Deduction Formulation 

The Natural Deduction system of contexts we have developed works only for 
the normal multimodal K fragment of PLC, N and T, that is, for a system 
we could call K„. Of course one could always add up the axiom A to this 
system, but adding any axiom to a natural deduction system seems a bad idea. 
Our natural deduction system comprises the usual natural deduction rules for 
the propositional connectives, plus the following schema of rules, one for each 
modality ist(K, _ ). 

r\- ist{n, Ai) Ai,A2,...,Ak\~ B 

rhist(fc,B) 

where by T h ist(K, we mean a sequence of derivations F h ist(K,di), 
r h ist(K, A 2 ), . . . r \- ist(K, Ak). This is an old formulation of normal modal 
K, dating back at least to the mid-eighties [Bel85]. People familiar with the 
formulation of the necessitation rule for system K in sequent calculus, need to 
note that the new rule □„ “builds-in” substitutions. 

The monomodal system using rule (for a single modality □) over a con- 
structive basis is discussed in detail in [BdPROl]. On that paper, several possible 
formulations of a natural deduction formulation for a basic notion of necessity 
are discussed and compared. In particular a discussion of Fitch-style Natural De- 
duction[F52], and its formulation as a framework for constructive modal logics 
versus Prawitz-style Natural Deduction and why we prefer the latter is sketched®. 
The reason is simply that it is not obvious how to provide categorical semantics 
for Fitch-style Natural Deduction formalisms, whereas it is so for Prawitz-style 
natural deduction. It is also briefly mentioned in [BdPROl] that we do not discuss 
approaches to constructive modal logics that use the semantics of modal logics, 
in terms of Kripke models and accessibility relations, as part of the syntactic 
information used to characterize these systems. Using the intended semantics 
to define your syntax may not be cheating, but feels somehow underhand, es- 
pecially when proving soundness of your system. Approaches along these lines 
include Gabbay’s labelled deductive systems and Simpson’s framework. Clearly 
our system is not a framework: we can only do a few modal systems (K and S4, 
possibly KT, KD,K4) and indeed rules change according to the system that 
we are considering. Our only advantage at the moment, when compared to the 
frameworks mentioned before, is to produce semantics of proofs for the systems 
we can deal with. This was the goal from the beginning. 

® One preliminary answer to how would the Curry-Howard isomorphism help context 
logics is that a type theory with context modalities could be easily implemented in 
an interactive theorem prover such as Isabelle or PVS and this would facilitate the 
creation/interconnection of large repositories of theories. 
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The multimodal extension of the system does not appear to present any 
problems. Localized derivation (ie. derivation in context) can also de done, by 
labelling the turnstyle with a given context k. 

r \-jr ist{K, Ai) Ai,A2,...,Ak\~j; B ^ ^ 

rh^ist{K,B) 

We call the system without localization B, because of Beilin’s 1984 paper and 
we can prove that B satisfies strong normalization/cut-elimination, subformula 
property and also enjoys a simple categorical semantics, in terms of a cartesian 
closed category together with a finite collection of endofunctors, one for each 
modality. We expect similar properties to hold for the localized system, but 
have not had time to verify it. 



7 Comparing Systems 

Both PLC and Nayak’s system M are given as axiomatic systems and comparing 
them first seems natural. Nayak’s system TV is clearly too simplified to compare 
with PLC, but given the system TV without axioms A3, A4, A5 and PLC, without 
A, do we have the same system? The question hinges on the effect of sequences 
of contexts, versus individual modalities, decorating the derivability relation. 
As we have seen the context switching rule of PLC can be substituted by the 
necessitation rule of localized multimodal K, if differences of vocabulary are 
disconsidered. But is this as general as usual multimodal K ? 

For instance, if we have two unrelated contexts Ki and K2, which can only be 
concatenated to form ki * K2 and A is a theorem, the following derivations are 
perfectly fine in K„: 



h A 

hC,,A 
h C^,C^,A 



h A 

hC,,A 



But presumably in PLC, only one of them, would be valid, as if ki * k 2 is a valid 
context sequence whereas K2 * Ki is not, then the context switiching rule can 
only be applied to ki * K2- At least for PLC this seems to be the case, as if the 
sequence of contexts doesn’t matter, they describe it a flat model. Only if all 
contexts sequences formed from a given set JC are valid and if only the distinct 
elements of any context sequence matter, ie if the derivability relation denoted 
by the context sequence Pki*k 2. is equivalent to the derivability relation 
denoted by any permutation of the sequence k\* . . .* Kn then Bouquet and 
Serafini’s claim that “PLC is just the normal multimodal logic K extended with 
the A axiom” is justified. If “the new theorems proved in PLC with respect to 
normal multimodal K are only due to A” then PLC is indeed a sublogic of TV 
and of Massacci’s T and in terms of provability exactly equivalent to Bouquet 
and Serafini’s MPLC. 
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Comparing Massacci’s system T to PLC, we see that since the axiom D is not 
provable in PLC, but is directly derived from negative lifting rule in T, T seems 
to prove more theorems than PLC. Also Massacci has proved that T proves all 
theorems that PLC proves. But it is not clear whether T proves only these. 

8 Conclusions 

Comparing the four systems in this note, it seems that a designer/user of a logic 
of contexts has plenty of choice between systems. He may choose not to have the 
axiom (Z\) at all, in which case our (localized) system, the restricted version of 
PLC and some variation of the tableaux system T should all be proved equivalent. 
This corresponds to a decent minimal logic of contexts. A context logic can also 
have the axiom A explicitly, as in PLC, or have its effect via multimodal KD, as 
Massacci’s system seems to indicate. If the effect of A is desired, the second route 
may be best, as one has at least the axiomatic and the tableaux versions already 
in place. Finally our context logic user may opt for a simplified logic of contexts 
along the lines of Nayak’s system. In that case, I don’t know how to provide 
a sequent calculus or a Gentzen-style Natural Deduction formulation. Proof- 
theoretic tricks, as taking the formulas of the system considered only up to the 
equivalence relation that identifies O^A with can be used, but the effect is 

not elegant. Lastly the comparison between our system and the MCS/LMS work 
deserves more discussion that could be given here [deP]. Briefly the MCS/LMS 
systems seem to be able to embed all traditional modal logics, constructively or 
not, very easily, in what is a generalization of Natural Deduction. But it is not 
clear to me how to decorate MCS/LMS proofs with terms in a Curry-Howard 
isomorphic way. This, as well as proof semantics for those systems is subjecto 
for further work. 

Acknowledgments. I would like to thank T. Altenkirch, C. Condoravdi, D. 
Crouch, R. Guha, L. Serafini and, especially, N. Alechina and T. Braiiner for 
discussions. 
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Abstract. This contribution examines the connectedness between the 
production and understanding of language, and between language use and 
context. It is firmly anchored to a relational conception of context and adapts 
Clark’s [4] conception of language use as both cognitive and social. The 
introduction spells out the basic premises for assigning language production and 
understanding the statuses of social actions. The second part discusses the 
connectedness between language production, language understanding and 
communicative contribution by examining the premises for the differentiation 
between linguistic competence and communicative performance. The third part 
extends the micro frame of investigation by accommodating a further layer of 
context and contextual constraints: communicative genre [18]. Contrary to a 
communicative contribution, communicative genre is a collectively oriented 
macro category based on collaboration, cooperation [13], We-intentionality [23] 
and social intelligence [12]. It functions as a filter by constraining the 
production and understanding of possible micro communicative contributions in 
accordance with a particular macro goal. 



1 Introduction 

Language is neither generally produced at random, nor is it understood at random. 
Rather, with the intention to communicate a speaker produces one or more utterances 
for a particular hearei“ Utterances are not only produced and directed at a 
coparticipant in a particular situation, but they are also produced to achieve one or 
more communicative goals. Following Clark [4], language use is conceived of as both 
cognitive and social: coparticipants perform cognitive and social actions by producing 
language, viz. they perform one or more communicative acts, and they understand 
language by performing cognitive and social actions through which they infer their 
fellow coparticipants’ communicative intentions. Against this background, language 



* Speakers and hearers are assigned a dual function in communication: a speaker also acts as a 
hearer, and vice versa. Following Schegloff [21], I refer to them and their dual function by 
the term of coparticipant which is intended to denote the fuzziness of the two categories. 

P. Blackburn et al. (Eds.): CONTEXT 2003, LNAI 2680, pp. 130-141, 2003. 
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use is an intentional and rational endeavour, which represents a social action par 
excellence: it is anchored to minimally two coparticipants, minimally one utterance 
and to context. As regards the production and understanding of language, both are 
anchored to cognitive contexts; as regards the retrieval of particular presupposed 
information in order to produce or calculate communicative meaning, the relevant 
propositions are anchored to (1) linguistic context, which frames the actual language 
output,12) cognitive context, which frames the actual language input, and (3) social 
context™ which frames the linguistic context in which language output is produced. 
These different types of context are conceived of in a relational manner and are, for 
this reason, interconnected: social-context information interacts with linguistic- 
context information, linguistic-context information interacts with cognitive-context 
information, and cognitive-context information interacts with linguistic-context and 
social-context information. As a consequence of this relational perspective, context 
denotes a dynamic concept which is constantly updated in communication, and, if 
necessary, revised and recontextualized [6]. 

The goal of this contribution is to examine the connectedness between the 
production and understanding of language on the one hand, and between language use 
and context on the other hand. It is firmly anchored to an ethnomethodological 
perspective, according to which social context is constructed in and through the 
process of communication [7]. The following section discusses the connectedness 
between language production, language understanding and communicative 
contribution by examining the premises on which the differentiation between 
linguistic competence and communicative performance is based. It shows that an 
interactive conception of these types of competence allows for the accommodation of 
cognitive, linguistic and social contexts which are required for assigning the minimal 
unit of language use, the utterance, the status of a communicative contribution. The 
third section extends the micro frame of investigation in order to accommodate 
macro-oriented layers of context and their contextual constraints: communicative 
genres [18] and networks [23; 19]. The production and understanding of a micro 
contribution must be in accordance with the corresponding communicative-genre 
goals, and this is the reason why a communicative genre is assigned the function of a 
filter. Contrary to the micro notion of communicative contribution, a communicative 
genre is a collectively oriented macro category based on collaboration, cooperation 
[13], We-intentionality [23] and social intelligence [12]. In conclusion, a relational 
and interactive conception of language use in context requires both micro and macro 
categories, such as individual and social intelligence, I-intentionality and We- 
intentionality, communicative contribution and communicative genre, and 
coordination and cooperation. 



^ This contribution does not explicitly distinguish between social context and sociocultural 
context. Instead, it employs the superordinate term of social context. This is because 
sociocultural context is interpreted as representing a particular subset of social context 
specified by culture- specific instantiations of, for example, basic deictic categories of space 
and time, or ideological conceptual meanings, such as democracy, freedom, autonomy or 
individual. 
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2 Language Production, Language Understanding and 
Communicative Contributions 

One of the fundamental premises in natural-language research is the differentiation 
between linguistic competence and communicative performance, and their underlying 
pillars of well-formedness and grammaticality on the one hand, and acceptability and 
appropriateness on the other hand. Contrary to the distinction of language as two 
independent system^] namely internal competence and external performance, this 
contribution adopts a relational and dynamic outlook on language and argues for 
interactive internal and external systems which are interdependent on cognitive, 
linguistic and social contexts. But of what nature is the connectedness between 
linguistic competence and communicative performance? Following Allen and 
Seidenberg [1:117], there is no clear-cut answer to this question: “The mapping 
between competence grammar and performance is at best complex, as we have noted; 
it is also largely unknown. A problem arises because the primary data on which the 
standard approach [generative linguistics] relies - grammaticality judgements - are 
themselves performance properties.” Against this background, the premise of two 
autonomous systems, viz. external E-language which exists outside the mind and is 
independent of it, and internal I-language which exists inside the mind and is 
connected with it, does no longer seem reasonable. Instead, a functional-grammar 
outlook on language is adopted, according to which internal sentences are related to 
external utterances. But where do internal sentences and external utterances meet, and 
where do they depart? Following Given [10:1], internal and external grammars serve 
different purposes: 

Perhaps the best way of saying what grammar is from a functional 
perspective is to say first what grammar is not. Grammar is not a rigid set of 
rules that must be followed in order to produce grammatical sentences. 
Rather, grammar is a set of strategies that one employs in order to produce 

coherent communication. 

Nothing in this formulation should be taken as denial of the existence of 
rules of grammar. Rather, it simply suggests that rules of grammar - taken 
as a whole - are not arbitrary; they are not just for the heck of it. The 
production of rule-governed grammatical sentences is the means by which 
one produces coherent communication. 

Thus, the rules of an internal grammar play a decisive role in the production and 
interpretation of utterances in context, to which they are expected to be connected in a 
coherent manner. But is their connectedness to context a sufficient condition for the 
definition of rule-governed grammatical sentences? To answer this question, we have 
to be more precise about the notions of coherence, context and rule-governed 
grammatical sentence. 



^ Some frameworks differentiate between an internal I-language, which is not affected by 
potential performance-related inadequacies resulting from interfacing logical form (LF) and 
phonetic form (PF) modules, and an external language, which is the realization of the 
interfacing modules of I-language in external linguistic and social contexts. 



Communicative Contributions and Communicative Genres 



133 



Contrary to the mutually exclusive definitions of internal language and external 
language, functional grammar assigns a dynamic status to language and assigns it the 
status of a constitutive part of the cognitive system [10]. Here, a rule-governed 
grammatical sentence is one means amongst others to produce coherent discourse. 
This functional outlook on rule-governed grammatical sentences requires the explicit 
accommodation of (1) coherence, (2) social, linguistic and cognitive contexts, to 
which the sentences and propositions are expected to be connected in a coherent 
manner, and (3) communicative intention, which is a basic requirement for the 
production of coherent or dovetailed conversation. Against this background, the 
production and interpretation of sentences in context is directly related to the 
production and interpretation of communicative contributions, which are produced 
and interpreted in accordance with the Cooperative Principle (CP) , namely “such as 
is required, at the stage at which it occurs, by the accepted purpose of direction of the 
talk exchange” [13: 45]. In the framework of the CP, coherence is spelled out by the 
concepts of “such as is required” and “dovetailed” [13:48], and in the framework of 
functional grammar, it is explicated by the surface phenomena of reference, ellipsis 
and substitution, conjunction, cohesive links and lexical cohesion, and by the macro 
phenomenon of discourse topic. As regards their connectedness to context, both 
coherence and the Gricean CP are anchored to some type of sequence and thus to 
cognitive, linguistic and social contexts. As a consequence of this, rule-governed 
sentences and their instantiations in context, viz. communicative contributions, must 
be examined with regard to their connectedness to cognitive, linguistic coherence and 
social contexts. 

Assigning rule-governed sentences and their instantiations in context, i.e. language 
production and language understanding, the status of a communicative contribution 
connects it with the fundamental pragmatic premises of rationality, intentionality and 
appropriateness. The former manifest themselves in the coparticipant’s production 
and interpretation of communicative contributions set in a game of giving and asking 
for reasons, which is explicitated by Brandon [2:xxi] as follows: “In a weak sense, 
any being that engages in linguistic practices, and hence applies concepts, is a 
rational being; in the strong sense, rational beings are not only linguistic beings but, 
at least potentially, also logical beings”. Brandon is even more explicit in his 
conception of rationality [2:117]: “Rationality consists in mastery of those practices 
[the game of giving and asking for reasons, as Sellars calls it]. It is not to be 
understood as a logical capacity. Rather, specifically logical capacities presuppose 
and are built upon underlying rational capacities.” Against this background, 
appropriateness is necessarily anchored to coparticipants, communicative 
contributions and contexts, and is calculated with regard to the connectedness 
between the linguistic representation of a communicative intention and its social and 
linguistic contexts. For this reason, appropriateness feeds on both external and 
internal languages: it draws from internal language for the formulation and 
interpretation of coparticipant-intended meaning, and it draws from external language 
with regard to the connectedness between a communicative contribution and its 
linguistic and social contexts. The anthropologist Muriel Saville-Troike [20:53-54] 
specifies the concept of appropriateness as follows: “The choice of appropriate 
language forms is not only dependent on static categories, but on what precedes and 
follows in the communicative sequence, and on information which emerges within the 
event which may alter the relationship of participants.” Put differently, a necessary 
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condition for the definition of appropriate language use is the notion of choice, which 
manifests itself in the coparticipant’s ability to choose between a more appropriate 
utterance and a less appropriate utterance. This is also reflected in the principle of 
sociolinguistic variation [3], according to which language is used differently in 
different situation^! Implicit in this principle is the premise that a particular 
communicative intention, for instance a request, can be expressed by an almost 
infinite number of utterances and thus in a more and less explicit, and in a more and 
less polite manner. As a consequence of this connectedness between the language 
system (linguistic expressions and grammatical constructions), language use 
(utterances and communicative contributions), social practice (what is considered as 
appropriate or inappropriate by a speech community) and context. Brown [3:169] 
points out that 

[o]ne cannot mechanistically apply the Brown and Levinson model of 
politeness strategies to discourse data; particular linguistic realizations are 
not ever intrinsically positively or negatively polite, regardless of context. 
Politeness inheres not in forms, but in the attribution of polite intentions, and 
linguistic forms are only part of the evidence interlocutors use to assess 
utterances and infer polite intentions. 

She then elaborates on the prerequisites of a linguistically and socioculturally 
competent coparticipant. In order to coparticipant-intend politeness, coparticipants 
have to be able to monitor their fellow coparticipants’ and their own actions, and 
possible perlocutionary effects in the framework of AIP (anticipatory interactive 
planning): 



To operate according to the model, speakers have to be able to modify the 
expression of their communicative intentions so as to take account of 
what they see as their interlocutor’s views of what they might be taken to 
be wanting to communicate, including what impositions to face might be 
on the table, as well as his or her assessments of the speaker’s and 
hearer’s relative power and social distance. [3:154] 

The foundations, on which AIP is based, are (1) the calculation of the 
coparticipants’ social actions, (2) their possible perlocutionary effects, and (3) the 
degree of politeness communicated. According to the view taken here, this is only 
possible if coparticipants are intentional and rational agents who act in accordance 



* What is of interest here is the fact that the principle of sociolinguistic variation is not 
restricted to the domains of language use and social practice. It is also inherent in Brandon’s 
[2:425] philosophy-of-language approach and his premise, that 

[a] language cannot refer to an object in one way unless it can refer to it in two 
different ways. This constraint will seem paradoxical if referring to an object by 
using a singular term is thoughtlessly assimilated to such activities as using a car to 
reach the airport or using an arrow to shoot a deer: even if only one car or one arrow 
is available and impossible to reuse, what one is doing can still genuinely be driving 
to the airport or shooting the deer. 
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with the principle of sociolinguistic variation by selecting the most appropriate form 
for their goals. That is, they choose particular lexical expressions amongst other 
possible expressions, particular grammatical constructions amongst a number of other 
possible constructions, and particular intonational contours amongst a number of 
other possible contours. 

To conclude, it does not seem reasonable to see communicative performance as an 
activity independent of the mind. Instead, it must be re-evaluated and recontextualized 
as expressing a speech community’s conception of what is appropriate and what is 
not. Appropriateness manifests itself in social practice and its underlying rules and 
regularities and therefore denotes a far more complex phenomenon than had been 
anticipated: it is anchored to (1) the linguistic units of sentence, proposition and 
utterance, (2) the principle of sociolinguistic variation, which is a constitutive part of 
intentional and rational communication and manifests itself in the coparticipant’s 
anticipatory interactive planning, (3) the exchange of communicative contributions, 
(4) the coparticipants seen from I - we and I - thou perspectives [2:508], and (5) their 
micro and macro linguistic and social contexts. As a eonsequence of this change of 
perspective, linguistic competence is a subset of communicative competence and 
denotes the coparticipant's ability to differentiate between grammatical and 
ungrammatical sentences, appropriate and inappropriate communicative 
contributions, and coherent and incoherent dialogue, as well as to her /his ability to 
produce coherent dialogue and interpret dialogue in a coherent manner, which is 
examined in the following section. 



3 Language Production, Language Understanding, and 
Communicative Genre 

Coherence is a macro concept which is anchored to the macro-oriented domain of 
communicative genre. From a relational viewpoint, it is also reflected in its 
communicative contributions. Thomas Luckmann [18:177] explicates the functions 
and forms of a communicative genre as follows: 

Communicative genres operate on a level between the socially constructed 
and transmitted codes of ‘natural’ languages and the reciprocal adjustment of 
perspectives, which is a presupposition for human communicative 
interaction. They are a universal formative element of human 
communication. 

(...) 

Human communicative acts are predefined and thereby to a certain extent 
predetermined by an existing social code of communication. This holds for 
both the ‘inner’ core of that code, the phonological, morphological, semantic 
and syntactic structure of the language, as well as for its ‘external’ 
stratification in styles, registers, sociolects and dialects. In addition, 
communicative acts are predefined and predetermined by explicit and 
implicit rules and regulations of the use of language, e.g. by forms of 
eommunicative etiquette. 
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Luckmann’s references to “/? redefined” and “predetermined” are also implicit in 
John Heritage’s [15:242] notion of doubly contextual which designates the dual 
function of linguistic context in conversational interaction: on the one hand, a 
communicative contribution invokes linguistic context by constructing it itself, on the 
other hand, its sole production and interpretation provide the context for the 
subsequent talk. That is, a communicative contribution relies upon existing context 
for its production and interpretation, and it is in its own right an event which shapes a 
new context for the action that will follow. Thus, the act of speaking and interpreting 
constructs contexts and at the same time constrains the construction of contexts. In the 
following Luckmann’s outlook on the macro category of a communicative genre and 
its constitutive pillar of coherence are refined by the explicit accommodation of (1) 
the intentionality of social action, (2) I-thou sociality [2] and we-intentionality [23], 
(3) intersubjectivity, and (4) practical reasoning, AIP and social intelligence. 

Natural-language communication is frequently defined by interlocutors, their 
communicative intentions and by the performance of unilateral speech acts in context. 
But context, viz. cognitive, linguistic and social contexts, is not a unilateral, but rather 
a relational notion which can not be reduced to a micro context only. Rather, it is 
represented by interdependent layers, to employ the onion metaphor [25], or by 
interdependent frames [11]. For this reason, the retrieval of a communicative 
contribution’s micro contextual references is necessarily connected with the macro 
category of coherence and thus with a communicative-genre frame of reference and 
its constitutive discourse topic(s). This extension of frame and the change of 
perspective from a micro-contextual, bottom-up approach to a macro-contextual, top- 
down frame of reference has the necessary consequence that the coparticipants' 
production and interpretation of communicative meaning can no longer be restricted 
to their individual intentions only. Instead, the production and interpretation of 
intersubjective meaning must be based on the Searlean conceptions of collective 
intentionality and We-intention [22, 23, 24], on Dascal's [5] conception of collective 
we- intention and on Brandon’s [2] conception of I-thou sociality. Against this 
background, communicative meaning is calculated in accordance with the particular 
macro constraints of a communicative genre, which filter the production and 
interpretation of intersubjective meaning accordingly. But is a communicative genre 
performed intentionally? 

The intentionality of action is not only a core concept in the research paradigm of 
natural-language communication but also in the field of artificial intelligence with 
respect to the question of how longer stretches of talk are processed. What is of 
relevance for this investigation about the connectedness between intentions and 
context is Litman and Allen’s [17:376] process-oriented differentiation between 
discourse intentions or plans of a speaker and plans generated by these plans: 
“Discourse intentions are purposes of the speaker, expressed in terms of both the task 
plans of the speaker (the domain plans) and the plans recursively generated by the 
plans (the discourse plans)”. In other words, a speaker may have a particular discourse 
intention, for instance to conduct an interview about renting an apartment, but s/he 
can not plan every single action or task by him/herself because a plan is a dynamic 
construct which may generate different subplans, if the immediate context requires a 
change of action. Moreover, if Litman and Allen’s conception of a discourse 
intention is adapted to a dialogue setting, discourse intentions are not only postulated, 
but also have to be ratified in interaction by accepting or rejecting them. Furthermore, 
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like any other pragmatic presuppositions, discourse intentions can be represented both 
explicitly and implicitly. Analogously to the explicit or implicit linguistic 
representation of intentions in speech act theory, Dascal [5] differentiates between 
overt and covert collective we-intentions. 

The dynamic nature of discourse is also reflected in the processing mode 
employed. Contrary to the bottom-up processing of single actions, discourse 
processing, as is argued by Litman and Allen [17:380], requires a top-down approach: 
“Once a set of discourse and domain plans is recognized, each is expanded top down 
by adding the definitions of all steps and substeps (based on the plan libraries), until 
there are no unique expansions for any of the remaining substeps.’’ If the artificial- 
intelligence setting is adapted to the constraints and requirements of natural-language 
communication, the recognition of ‘a set of discourse and domain plans’ can be 
compared and contrasted with the hearer’s calculation and recognition of the 
speaker’s communicative intention and the corresponding inference processes 
involved. But are these two tasks really equivalent? As regards non-complex plans, 
such as a request to pass the vinegar, the two domains can be equated. As regards 
complex plans, however, for instance the performance of the communicative genre of 
an interview, we have to differentiate between a discourse or a macro intention 
anchored to the macro category of a communicative genre as a whole, and a 
communicative or micro intention anchored to the performance of an individual 
communicative contribution as a part (of a whole). Yet what kind of relationship is 
there between a micro or an individual I-intention and a macro or a collective we- 
intention? 

In his investigation about the connectedness between intentionality and 
conversation Searle [22:400] explicitly stresses the fact that: 

[c]ollective intentional behavior is a primitive phenomenon that cannot be 
analyzed as just the summation of individual intentional behavior; and 
collective intentions expressed in the form “we intend to do such-and-such” 
or “we are doing such-and-such” are also primitive phenomena and cannot 
be analyzed in terms of individual intentions expressed in the form “I intend 
to do such-and-such” or “I am doing such-and-such” 

Against this background, we-intentions of collective intentionality can not be 
reduced to the summation of individual I-intentions. Instead, they are intrinsically 
linked to the macro category of communicative genre, or to employ Searle’s [22:406] 
own words: “The reason that we-intentions cannot be reduced to I-intentions, even I- 
intentions supplemented with beliefs and beliefs about mutual beliefs, can be stated 
quite generally. The notion of a we-intention of collective intentionality, implies the 
notion of cooperation.” For this reason, we-intentions are necessarily anchored to 
dialogue, viz. to a frame of reference that goes beyond an individual communicative 
contribution, which, if conceived as a part in a whole, “is derivative from the 
collective intentionality ‘we are doing act A’” [22:403]. Thus, the concept of we- 
intentionality is a context-dependent notion par excellence. Yet it is not only anchored 
to cognitive contexts but also, as explicated and specified in Searle [23], to social 
contexts and social reality, which are further refined in Searle [24:109] into 
background presuppositions anchored to sociocultural context that vary from culture 
to culture. Social reality and social context are constructed through collective 
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representation in accordance with constitutive rules and collective acceptance. 
Moreover, we-intentionality is anchored to the dialogue principle of cooperation: 
“Collective intentionality presupposes a Background sense of the other as a candidate 
for cooperative agency; that is, it presupposes a sense of the others as more than just 
conscious agents, indeed as actual or potential members of a cooperative activity.” 
[22:414]. 

The premise of cooperation, on which Searle’s definition of we-intentionality is 
based, is further refined by Brandon [2:508] with regard to the more basic category of 
I-thou sociality: 

The social distinction between the fundamental deontic attitudes of 
undertaking and attributing is essential to the institution of deontic statuses 
and the conferral of propositional contents. This is, (...) an I-thou sociality 
rather than an I-we sociality. Its basic building block is the relation between 
an audience that is attributing commitments and thereby keeping score and a 
speaker who is undertaking commitments, on whom score is being kept. The 
notion of a discursive community - a we - is to be built up out of these 
communicating components. 



This intersubjective stance is further refined by Grosz and Sidner [14:427], who 
underline the cooperative and collaborative nature of dialogue as follows: “To 
account for extended sequences of utterances, it is necessary to realize that two agents 
may develop a plan together rather than merely execute the existing plan of one of 
them. That is, language use is more accurately characterized as a collaborative 
behavior of multiple active participants.” 

Collaborative behaviour is also a key concept in the language-as-social action 
paradigm, where it applies to the joint production of a communicative genre, the joint 
production of communicative contributions and the joint production of utterances. 
Following Goody [12:26], the macro category of a communicative genre is conceived 
of as “socially constructed models for the solution of specific types communicative 
problems”. That is, a speech community provides particular ‘plans’, or particular we- 
intended macro propositions, in and through which specific types of communicative 
actions are performed. For instance, the task of seeking and providing information is 
generally performed in and through the communicative genre of an interview, the task 
of influencing and persuading people is generally performed in and through the 
communicative genres of a speech or a debate. 

The inferencing processes involved in natural-language communication are the 
standard ones, namely deduction, by which one infers specific instances from a 
general rule, induction, by which one presumably discovers the general rule from a 
representative sample of specific instances, and abduction, by which one reasons by 
hypothesis from instances or general rules to their wider context. Givon [8:14] 
stresses the decisive difference between abduction and the deductive procedure of 
inferring specific instances from a general rule, “[t]his mode of hypothesis 
[abduction, A.F.] often involves analogical reasoning, and thus the pragmatic, 
context-dependent notions of similarity and relevance”. The importance of abductive 
reasoning is also stressed by Levinson’s [16:230] reference to Aristotle: “As Aristotle 
argued, the logic of action is a distinct species of non-monotonic (defeasible) 
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reasoning, a practical reasoning (PR) as it has been dubbed by philosophers.” Thus, 
practical actions do not exist as such but are performed by social actors who act in a 
rational manner and are, for this reason, in a position to account for their social 
actions. Garfinkel [7:3] explicates the process and product of the accountability of 
social action as follows: 

(1) Whenever a member is required to demonstrate that an account analyses 
an actual situation, he invariably makes use of the practices of "et cetera", 
"unless" and "let it pass" to demonstrate the rationality of his achievement. 

(2) The definite and sensible character of the matter that is being reported is 
settled by an assignment that reporter and auditor make to each other that 
each will have furnished whatever unstated understandings are required. 
Much therefore of what is actually reported is not mentioned. (3) Over the 
time for their delivery accounts are apt to require that "auditors" be willing to 
wait for what will have been said in order that the present significance of 
what has been said will become clear. (4) Like conversations, reputations, 
and careers, the particulars of accounts are built up step by step over the 
actual uses of and references to them. (5) An account’s materials are apt to 
depend heavily for sense upon their serial placement, upon their relevance of 
the auditor’s projects, or upon the developing course of the organizational 
occasions of their use. 

As regards the chronology of accounting for one’s actions and performing actions, 
social actors can only account for their social actions once they have processed and 
contextualized them by filling the gaps and finding the grounds to argue for the 
appropriateness of a social action. Through the process and product of accounting for 
social actions, which is anchored to a retrospective-prospective outlook on 
communication, social actors demonstrate substantive rationality and daily life 
rationalities. So far, reasoning has been primarily seen from a hearer viewpoint who 
calculates the meaning of an utterance with regard to the question of what the speaker 
intends to communicate. But what happens if a dialogue stance is adopted? 

Dialogue is based on collaboration and cooperation and therefore requires social 
intelligence, which manifests itself in interactive thinking [3] and AIP [12] as well as 
in the social-interaction notion of a learned program. The dyadic and dialogic 
conception of AIP is explicated by Goody [12:12] as follows: 

Both through inner speech, which is the sort of dialogue with ourselves 
(between me and I), and through our close attention to conversational 
partners, spoken language seems to have constructed a dialogue template for 
social cognition. In inner speech and in conversation, dialogue and the dyad 
are built into human cognition. 

(...) 

This highlights the complexity of AIP, which must both model contingent 
responses and model strategies for securing actions from others which are 
favourable to ego’s goals. AIP moves constantly back and forth between 
modelling and strategic action. 
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To conclude, a social conception of language based on the (partly) joint production 
of communicative contributions and the joint production of communicative genres 
can no longer be based on an autonomous outlook on language. Instead, it requires a 
relational conception of language anchored to the collective category of 
communicative genre based on collaboration, cooperation [13], We-intentionality [23] 
and social intelligence [12]. Against this background, it seems more plausible to adopt 
a network perspective which employs parallel distributed processing [24, 25]. 



4 Conclusions 

A relational conception of language use in context requires both micro and macro 
categories, such as individual and social intelligence, I-intentionality and We- 
intentionality, communicative contribution and communicative genre, and 
coordination and cooperation. The dual status of communication as both cognitive 
and social and its consequences for the actual production and interpretation of 
utterances in context is succinctly formulated by Levinson [16:238]: 

Linguistic communication is fundamentally parasitic on the kind of 
reasoning about others’ intentions that Schelling and Grice have drawn 
attention to: no-one says what they mean, and indeed they couldn’t - the 
specificity and detail of ordinary communicated contents lies beyond the 
capabilities of the linguistic channel: speech is a much too slow and 
semantically undifferentiated medium to fill that role alone. But the study of 
linguistic pragmatics reveals that there are detailed ways in which such 
specific content can be suggested - by relying on some simple heuristics 
about the ‘normal way of putting things’ on the one hand, and the feedback 
potential and sequential constraints of conversational exchange on the other. 
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Abstract. Our concern is the explanation generation in a representation based 
on contextual categorization. We point out that the explicit consideration of the 
context is necessary for the generation of relevant explanations. We present 
how the model captures the context necessary for explanation and we report 
some results compatible with the hypothesis that explanation is based on 
categorical networks according to a model based on the Galois lattice. 



1 Introduction 

In the past 30 years, Artificial Intelligence research aimed at developing automated 
reasoning systems and programs for solving problems. In 80’ s, the question was to 
develop systems that were able to explain their line of reasoning. Although at that 
time Artificial Intelligence was the science that explored the most deeply the 
explanation process, an implicit assumption was to consider explanations as a vehicle 
for transferring information and knowledge from the machine to the user, supposing 
that the machine was the oracle and the user the novice [1]. Feedback were used by 
the system to tailor its explanation to user's needs. However, such feedback were 
limited to acceptance signs, and users rarely may intervene in the generation of 
explanations. An opposite position was taken in the SEPT application by letting the 
user build alone his explanation [2]. This was not a better solution because users had 
to tackle complex commands as an additional task with their work and temporal 
constraints. An actual perspective is that the user and the system must cooperate to 
solve jointly the problem and to co-construct the explanation of the solution [1]. 

Because a good explanation is a contextualized explanation, the lack of contextual 
information about the task at hand has been recognized as a weakness of rule-based 
systems [4]: there is a lack of consideration for context, a recurrent problem in 
knowledge engineering. Our position is that explanation is based on context, and that 
context can be processed through a contextual categorization mechanism. 

Consider, for instance the sentence ”/ heard a lion in my office this morning" [4]. 
This is an ambiguous sentence since either the lion (interpretation a) or the person 
(interpretation b) can be in the office and the other in the near outside. Additional 
context such as “This is what the man says when he enters the Police station” will 
favor the former while “This is what the man says after he explains he was watching 
on a TV program on animals in his office” will favor the latter. Our proposal is to 
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demonstrate that and how an explanation rest on categories of actual objects using 
actual relational properties that the human cognitive system builds up. Starting with 
sentence 1, object-person and object-lion are put in the same category because they 
have the property of “being close from each other” and to be both located in the office 
place (the building that comprises the office). Subcategories are “the person” and “the 
lion” (Figure 1). 



^i>finwwled^can be attacked biTTtwi^ 
^ 0; humatv^fei nt 



inga 



(Knowledge:attack human beings) 
y 0: I ions 



•f": (inference ; in the office bullding)i(lnference:close to each-other) 



0: the particular lion 




hears the I ion J n the office^ 
'o: the person 



Fig. 1. According to the contextual categorization theory, when listening “I heard a lion in my 
office this morning”, 1. actual objects “the person who is talking” and “the particular lion” are 
depicted as objects (o), 2. they are instantiated as subordinate categories with their specific 
features (f), 3. the categories /person/ and /particular lion/ are linked to their known categories 
and properties /human beings - can be attacked by lions/ and /lions-attacks human beings/ and 
are linked to their contextual superordinate category of being close to each-other (by inference 
because the person hears the lion) and located in the same building (by inference because they 
are close to each-other in a place that comprises the office). Notice that alternative 
interpretation locating the actual lion in the office instead of the person would exhibit the same 
network of contextual categories 

From the network of contextual categories of Figure 1, one can tell the story, as 
“This is a story about human beings and lions. In an office’s building there were a 
person and a lion and the person hears the lion”. First, note that this description 
matches both interpretation a and b. This is one of the features of contextual 
categorization, which is the capability of extracting commonalties while it compute 
contextual diversity. Other important feature is that description/explanation can be 
generated by top-down parsing the network of categories. 

Another context of the "hearing a lion" situation might be provided as follows: "I 
work in an university near a zoo that I can see from the window of my office. There 
are lions in that zoo. I often hear lions roar. It just was the case this morning." Then, 
the network of categories relating to the situation is rectified: the places of the lion 
(the zoo) and of the person (the office at the university) are differentiated, and the 
university will be near the zoo, and the superodinate known category for the actual 
lion will be “lions in zoo” (that do not attack human-beings”). 

The question addressed in this paper concerns the way in which relevant 
explanation can be built from such a network of categories. One general purpose is to 
define the content and the format of an explanation according to the context. 

The paper is organized as follows. Section 2 introduces explanation. We propose 
an inventory of current work about explanation, mainly about how explanation is 
generated and we detail the parameters and factors that influence explanation. Section 
3 presents how context plays a key role in explanation generation by identifying 
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different contexts. Then we describe the contextual categorization framework for 
building explanation (Section 4) and we report data that support our proposal. Finally, 
we conclude (section 5) by discussing how our framework might be completed in 
order to be a general model of the construction of contextually based explanation. 



2 What Is Explanation? 

Derived from the Latin word explicatio(nem), and from explicare which means 
literally “to unfold”, from plicare, “to fold” (used until the XVIIe century). This word 
appeared in the French language in 1322, and was borrowed from the French into the 
English language in 1528. 

According to different dictionaries as sources (Larousse, Hachette) but also to 
dictionaries of psychology, explanation is both the action of explaining as well as a 
development intended to make something comprehensible. It can also mean an 
account of something or a clarification concerning a series of actions taken. In French, 
it can also mean the supervision of someone, a discussion, quarrels concerning the 
supervision of someone. In cognitive psychology, an “explanation” occurs when a 
subject adopts a system of meaning, coherence, a presumption of a structure in a text 
or of a phenomenon. In a general sense, explanation has the aim of solving a problem 
of comprehension. Let us retain, in the extended meaning, that explanation 
corresponds to any operation implied in the constitution of the understanding of a 
phenomenon. 

According to the goal of explaining, we can distinguish different kinds of 
explanation: account, alibi, annotation, apology, clarification, comment, commentary, 
definition, demonstration, description, elucidation, excuse, gloss, exemplification, 
explication, exposition, illumination, illustration, interpretation, justification, plea, 
reason, solution, indication. There are other kinds of explanation in French such as 
analogy, answer, argument, causes, controversy, debate, development, discussion, 
dispute, exegesis, explanation, exposure, hermeneutics, information, key, motive, 
note, notices, paraphrase, precision, reason, study, talk, translation. 

All the different kinds of explanation do have a content. They are also constructed 
according to a format and they are communicated through a medium. Explanation 
also depends on other factors such as the purpose of the explanation, the explainer and 
the explainee, feedback and task constraints. In our approach, we consider 
explanation between two actors, two people or a person and a machine. Our research 
is restricted to the content of explanation and its format. 

Content. The content of any explanation comprises the phenomenon to be 
explained either explicitly or implicitly, and additional information. Additional 
information (1) has to be related to what is to be explained either as knowledge or as 
inference, and (2) has to be drawn from the same domain or, if drawn from a different 
domain, should show similar relations between elements (analogy). Finally, the 
content has to integrate contextual information in order to highlight the phenomenon 
to be explained. 

Format. The content has a certain structure, a format that sequentially structures 
the content. For example, "A whale is said to be mammal because whales breast-feed, 
their children" is one kind a format. "Whales breast-feed their children. A whale is 
said to be a mammal" is another kind of format. 
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3 The Context, a Key Factor for Explanation Generation 

3.1 Preliminary 

Mackie [5] has already stressed the context-dependency of explanation as a process of 
making a distinction between some current situation and another class of situations. 
Thus, context - involving both explainer beliefs and goals - is crucial in deciding how 
good an explanation is, and a theory of contextual influences can be used to determine 
which explanations are appropriate. 

Leake [6] considers the relationships between explanations and context in the 
framework of case-based reasoning. An explanation is required when there is a 
conflict between an event and a model that we have of the place where the event 
occurs. Leake argues that such a conflict is a property of the interaction between 
events and context: Any particular fact can be anomalous or non-anomalous, 
depending on the situation and on the processing we are doing. For example, 25°C 
may be considered hot weather for Paris, France, but cold for Rio de Janeiro, Brazil. 
To be relevant to an anomaly, explanations must resolve a belief conflict underlying 
the anomaly. To resolve an anomaly, an explanation must account for why prior 
reasoning led to false expectations or beliefs. Any anomaly would allow the retrieval 
of explanation for identical anomalies, provided that the same anomaly was always 
described the same way and that distinct anomalies always received distinct 
characterization. Finally, Leake lists ten major explanation purposes triggered by 
anomalies that rely on several elements of context (expected/believed conditions, 
previously unexpected conditions, possible repair points, actor's motivations, etc.). 

Thus, explanation and context are strongly intertwined. Explanations make context 
explicit in order to clarify a step in the reasoning process. They are a means to point 
out the links between the problem at hand and shared knowledge in its current state. 

The way in which an explanation must be chosen and generated depends 
essentially on the context in which the two actors find themselves. An explanation 
always takes place relative to a space of alternatives that require different 
explanations according to the current context. Comparing two explanations leads 
seeing how their contextual spaces differ. Thus, taking into account context is 
necessary to study explanation [7]. 



3.2 The Different Contexts 

From an engineering point of view, the context is a collection of relevant conditions 
and surrounding influences that make a situation unique and comprehensible [8]. 
However, there are other points of view on context. In the accomplishment of a task, a 
person identifies which knowledge is relevant to the job based on previous 
experience. What Brezillon and Pomerol [3] call “contextual knowledge” are pieces 
of knowledge judged relevant and which can be mobilized at a specific step in the 
decision making process. A subset of the contextual knowledge at that step is 
invoked, structured and situated according to the focus corresponding to the step in 
the decision making process. This subset is called the proceduralized context. 
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An important issue is the transition from contextual knowledge to the 
proceduralized context. Proceduralization depends on the focus on a task, even for a 
political conversation which can be based on how a political program should work. 
Thus, proceduralization, like “know how”, is task-oriented and is often triggered by 
an event or primed by the recognition of a pattern. Another aspect of 
proceduralization is that people transform contextual knowledge into functional 
knowledge or causal and consequential reasoning in order to anticipate the result of 
their own actions. Proceduralization requires a consistent explicative framework in 
order to anticipate the results of a decision or an action. This consistency is obtained 
by reasoning about causes and consequences in a given situation. We can thus 
separate the reasoning between diagnosing the real context and anticipating the follow 
up. The second step requires conscious reasoning about causes and consequences. 

A second aspect of proceduralization concerns a kind of instantiation. This means 
that the contextual knowledge or background context needs further specification to fit 
the task at hand. Precision and specification brought to bear on the contextual 
knowledge are also a part of the proceduralization process. For instance, it has been 
shown [9] that there are different levels of context, from the more general to the more 
specific and heterogeneous. A context at one level (e.g. the group context) contains 
rules that are instantiated at the level below (e.g. individual contexts). For example, 
when the rule is a speed limit of 50 km/h in a city (group context), a driver will 
control the speed of his vehicle using the accelerator and brake pedals (individual 
context). 



4 Contextual Categorization 

Contextual categorization is a component of diverse theories or models about human 
cognition that comprises perceptual categorization [10], categorization for text 
understanding [11], or task oriented categorization [12]. We describe how contextual 
categorization Theory works and how it can help modeling the generation of 
explanation and how it can be used for modeling the transition from contextual 
knowledge to proceduralized knowledge. Contextual categorization is founded on a 
basic and simple mechanism that is applied to process environmental inputs of any 
kind. It is of interest because it has two main results: (i) it shows the organization of 
both present objects and of present properties, or, more precisely, how properties are 
distributed to form categories of objects and (ii) it reflects the organization of the 
world. The contextual categorization model operates on Galois Lattices to create a 
single hierarchy of categories with transitivity, asymmetry and irreflexivity, when 
given the X boolean table which indicates for each of the n objects, O, whether 
it does or doesn’t have each of the m properties, P. The maximum number of 
categories is either 2”-l, or m if m < 2"-l, in a lattice whose complexity depends on 
the way properties are distributed across objects (table 1). For instance, having three 
objects (a, b c), the maximum number of categories is seven: the categories that 
factorizes respectively abc, ab, ac and cd shared properties and the categories that 
comprehend respectively a, b, and c unshared properties. 

The Galois lattice corresponding to the binary description in table 1 is shown in 
Figure 1 as a hierarchy of Categories of objects defined by properties. The link 
between categories is a "KIND-OF" link: Y is a kind of X. Due to the inheritance 
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principle; category Y includes properties of category X. This can be seen in the 
Boolean table in table 1. The Galois Lattice can be used to build a hierarchy of 
categories that merges when factorizing the properties [13]. The categories are 
contextual because they are a function of what objects are in the current situations and 
rather than simply a function of pre-existing categories in long-term memory. This is 
an alternative to case-based theories that need to encode each of the contexts in which 
an object could be met. The hierarchy of categories provides a circumstantial and 
contextual structure of the objects present. What is fundamental to contextual 
categorization, is that contextual categorization computes each unique object in the 
context of all the other objects that form its unique context. 



Table 1. A binary description of “I heard a lion in my office this morning” that corresponds to 
the Galois Lattice in Figure 1 

j human beings the person lions the particular lion 

hears the lion t 

(Knowledge lattack human beings) 1 1 

(inference : in the office building) 1 1 

(inference :c)ose to each-other) 1 1 

(knowledge ;can be attacked by lions) 1 1 

in the office 1 



Explanation is based on description. We propose, first, that contextual 
categorization is the mechanism that is used to describe the phenomenon to be 
explained and, secondly, that the description is obtained by constructing the Galois 
lattice of the situation including the context. Third, that explanation is constructed 
syntactically by parsing the Galois lattice. 



4.1 Building a Description for Explanation 

Consider the material shown in Figure 2. It is an example of a set of characters we use 
in our experiments. We use such material for simplicity’s sake. However, the objects 
in other situations might be, for instance, the cars involved in an accident, or different 
results obtained from experiments, etc. Each display contains a number of objects. 
One object differs from all the others (i.e., an intrusive object) and participants are 
asked to detect the intrusive object [14] and to explain the way in which it is different. 
The intrusive object is the only specimen in its category. For example, in the sample 
presented Figure 2, the intrusive object is i because the set of objects are letters; both 
vowels and consonants; however, i is the only vowel and thus it is the intrusive object. 

The goal of this particular experiment was to demonstrate that both detection and 
description are based on contextual categorization. 

From a cognitive point of view, seeking an intrusive object is seeking an object 
that has fewer characteristics in common with the other objects in the set. To 
complete the task, the participant has to use a categorization process considering the 
relevant properties of the situation (contextual properties). Once the contextual 
network of categories has been built, subjects easily evaluate the number of properties 
shared by each object in the set. Thus, they are able to indicate the object that has the 
fewest properties in common with the others. We call this object the intrusive object. 
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Fig. 2. An example of a set of objects to be described 

Participants can be asked to explain their choices and to justify why they chose an 
object as being the intruder. Thus, we can observe whether they exploit the network 
of contextual categories that can be built from the display. In other words, we are 
interested in the process of construction of explanation with the hypothesis that the 
structure of explanation is related to the structure of the information considered, 
which is to say the relation between contextual knowledge and proceduralization. 

Consider the following set of 9 objects: “3,2”, “a small rectangle”, “E”, “a beetle”, 
“7t”, “51”, “a dog”, “H”, “a large square”. They can be put in categories that both 
group and differentiate them. For instance, “3,2” and “51” might be considered as 
“numbers” while “rectangle” and “square” can be grouped as “geometric shapes” and 
“H” and “E” grouped as “roman letters.” But in addition, “3,2”, “51”, “H”, “E” and 
“n” can be seen as “characters”. 

Figure 3 provides the network of categories in which the objects can he 
simultaneously grouped and differentiated. Our assumption is that the participant 
having to name the category to which the intrusive object belongs has to differentiate 
it from its context and will use the most specific level that differentiates the intrusive 
object from its context. For instance, “E” surrounded by squares and rectangles can be 
called a letter and the justification for choosing “E” as intrusive can be “because it is 
the only letter.” In contrast, “E” surrounded by “H” and “n” will be considered as “the 
only vowel”. 

First, this task of recognition makes it possible to control the context of the 
intruding object. Indeed, placing “E” among consonants or large rectangles does not 
lead to the same effects of context. Second, we can vary the context surrounding the 
intruder with objects belonging to categories more or less distant from it. In the 
example referred to above, if “E” is surrounded by consonants, then it is placed in a 
context that is semantically close: there are only two arrows joining the vowel 
category to the consonant category. In contrast, if we place “E” in a set of “large 
rectangles”, seven arcs are necessary to connect these two categories. The context of 
the intrusive object ”E” is more distant. 

We tested the material in a pre-experiment. 84,05 % of the responses we obtain 
from 41 participants were fitting the predictions. These results are compatible with 
Treisman & Gelade’s pop-out theory [15], although we explain the effect in terms of 
categorization and context rather than in terms of filter theory. In addition we found 
that more the context goes away from the intrusive object, more the explanation was 
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enriched. In this case, the participants don’t systematically observe that quoting the 
distinctive category of the intrusive object provides a sufficient explanation. 
Moreover, we observe another kind of effect of context. It seems more obvious to 
detect a consonant among vowels than a vowel among consonants. In consequence, 
there are contexts that simplify the task. We suppose that this effect comes from the 
number of objects included in the contextual category. Indeed, in the vowel category, 
we count six instances: a, e, i, o, u and y in opposite to the consonant category that 
comprises 20 of them. 




Fig. 3. The hierarchy of the set of categories that structures the set of 9 objects: “3,2”, “a small 
rectangle”, “E”, “a beetle”, “n”, “51”, “a dog”, “H”, “a large square”. 



4.2 Building a Structure for Explanation 

Explanation is the process of providing information to someone who already does 
have some prerequisite knowledge (which should be first evaluated). For instance, in 
order to explain what is a duck, "a duck" can be defined as being an "animal", like a 
"chicken", but going in the water. This example is based on a specific explanation: the 
description where causal bonds between events and objets, so the time, are not 
considered. The description just rest on our categorized knowledge to provided names 
of objects (categories) and properties. 

Such explanation-based verbalization was studied by Ganet [16], from data 
collected by Faure [17] with participants having to judge similarity between two 
sounds (sound 1 and sound 2) and to explain their judgment. To do so, twelve sounds 
were presented in couples providing 12 X 11 verbalizations by each of 20 listeners. 
We reasoned as if explanation-based verbalization were computed as predicted by 
contextual categorization, then participants should provide properties in their 
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description in a strict order, revealing how they proceduralize their contextual 
knowledge. 

We labeled "A" the properties common to both sounds (as in the sentence "the two 
sounds are soft"), "B" the properties describing sound 1 (as in the sentence "the first 
sound is rich and hot"), "C" the properties describing sound 2 (as in the sentence "the 
second sound is dry"), "D" the relational properties used to compare sound 2 to sound 
1 (as in the sentence "but the second sound is more brilliant than the first") and "E" 
the relational properties used to compare sound 1 to sound 2 (as in the sentence "the 
first sound was longer than the second"). An explanation such as "I found the two 
sounds dissimilar because if they are both quite hot, the second is brilliant and longer 
than the first" was then coded an "ACD" explanation while an explanation such as 
"the second sound was dry but shorter than the first which was soft and the two are 
low" was then coded as a "CDBA" explanation, contextual categorization and 
listening order from first to second sound predict that participants should build their 
explanation in a "A then B then C then D then E" order, which means than the 
following structured explanation based verbalization "ABODE". Among 325 possible 
formats, 31 (for example as "ADE", "AC", "A", "DE", "D") are compatible with 
predictions while the remaining 294 formats ("BA", "CDBA", "ACBDE", "ED") are 
not predicted by contextual categorization. 

Table 2 show the percentage of each type of feature (A, B, C, D and E) given as a 
first, second, third, four and fifth feature given by the participants. The first feature 
the participants enounced the most was of type A (67 %). The second was of type B, 
and so on. Although only 31 of 325 types (9 %) of possible verbalizations were 
compatible with contextual categorization, 53 % of the verbalizations corresponded to 
our strict predictions. More over, the more frequent verbalization was of the ABC 
form (10%), followed by ABCD (6%), by A (5.2 %), AD (5%) and BC (4.2. %). Each 
of the predicted form was six times more frequent than the non-predicted form. In 
addition the 13 most frequent forms of explanation were 54% of the total amount of 
verbalization. Among them, ten were predicted (unpredicted were AED, BCA and 
ABDA) and corresponded to 78 % of them. In summary, contextual categorization 
appears to be a good approximation of modeling such human descriptive explanation. 
More important, these results permit us to continue our research and exploring more 
deeply the explanative process with its other type of explanation. 

Table 2. Percentage of feature of type A, B, C, D and E as a function of the order in the verbal 
explanation. Type A features were given 63 % as the first feature. Type B features were given 
37 % as the second feature and so on 





tl 


t2 


t3 


t4 


ts 


A 


63,12 


4,38 


10,98 


9,86 


9,35 


B 


20,50 


37,37 


11,53 


9,13 


5,61 


C 


4,12 


23,71 


40,07 


21,88 


21,50 


D 


3,95 


17,61 


27,88 


36,78 


34,58 


E 


8,31 


16,92 


9,55 


22,36 


28,97 
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5 Conclusion 

Earlier expert systems, like MYCIN, align their explanation facility directly with the 
reasoning paths that define movement across contexts of the diagnostic system. Thus, 
all traces of reasoning that represent the traversed contexts are kept and their contents 
provided to the user for explanation. Here, the definition of context is restricted to 
knowledge and to steps of inferences but MYCIN is not selective in its construction 
of explanations. 

In other systems, such as that described by Wick [18], an explanation facility is 
aligned only periodically with the reasoning of the system. In Wick’s system, only 
some parts of the contexts that the system reasons with, are explained to the user. In 
this approach, additional explanatory knowledge (the domain knowledge and the 
expertise that are not directly necessary for the task at hand) may be used to generate 
enhanced explanations. This implies that the explanation path separates from the path 
of reasoning to produce effective explanations. Context is here an extended version of 
the previous one because it also contains domain and task knowledge not directly 
considered in the reasoning of the problem solving, and eventually some information 
on users through a model. One problem with such an approach is that it may be 
unsuitable for critical applications whose results may affect the safety of processes 
and people. 

Another approach for explanation is to accept that the reasoning of the system is 
often different from that of the user. Thus, the user and the system may have different 
interpretations on the current state of the problem solving. The differing 
interpretations will be compatible if the user and the system make proposals, explain 
their viewpoints and spontaneously produce information [2]. In order to align the 
system’s reasoning with that of the user and vice versa, the user and the system must 
co-construct the explanation in the current context of the problem solving. People 
who are trying to understand something often may offer an explanation that embodies 
their current understanding, expecting to have it corrected [19]. Thus, explanations 
become an intrinsic part of the problem solving and, as a consequence, the line of 
reasoning of the system may be modified by explanation. This leads to cooperative 
problem solving. Again, context here is an extended version of the context in the 
previous approach because it also integrates direct information from users, mainly on 
the basis of their actions on the system and on the real-world process. 

If it seems acceptable that explanations intervene in the evolving context of 
interaction, it is difficult to say more about this for two reasons. Firstly, the co- 
building of explanations is an accepted idea but rather very few studies consider it. 
Secondly, context being not a mature domain of research, its dependency upon 
explanations is not really considered. For example, Lester and Porter [7] propose a 
model of explanation generation that includes simple methods for representing and 
updating context. However, their model makes assumptions about the representation 
of the context, not about how it is processed. 

An alternative can be found in the line of Bever & Rosembaum work [20], we can 
see the semantic analysis as the form of a semantic hierarchy of features characterized 
by a principle of cognitive economy of which the first effect is to bring back the 
infinite diversity of the environment to a finished number of category. A formalism 
which seems adapted for that is the lattice of Galois [21] [13]. In the line of the theory 
on the categorization known as "based on the properties", we use the Galois lattice for 
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description and the formation of the categories, a category being defined by a set of 
surface properties (visible properties), structural properties (parts and fitting of the 
parts of the object), functional properties (for what the object is used) and procedural 
properties (how we use it). These categories are used to group the objects which are 
gathered because they share common properties. The properties are used to form 
categories but the categories are also used to allot properties [22] [36]. Our general 
assumption is that the lattice of Galois is also a suitable formalism to explain the 
cognitive construction of the verbal production of the comparisons which are made to 
build up explanation. 

Our approach might be useful for Automatic Generation Of Explanation if we 
could diagnose the user description of the data. As shown with our current 
experiments, the labels the participants use to describe the objects might reflect the 
level of differentiation, which is to say the way they conceptualize the data. The 
context appears very useful and deserve to be considered. 
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Abstract. This experiment examines the effects of context on olfactory 
descriptions. Odors are difficult to describe, and their verbalization results in 
strong individual variation. Sixty subjects were asked to describe 12 floral 
perfumes in two environmental contexts: an isolated context in which the odors 
were presented one by one, and a comparative context in which they were 
presented in groups of three. The results show contextual effects on the 
verbalization of olfactory properties. When the odors were presented in groups 
of three, 1) the subjects generated a larger number of olfactory descriptors, 2) 
there were fewer unique properties, i.e. generated by only one subject, and 3) 
subjects were more likely to verbalize general properties than specific 
properties. We discuss these results in light of categorization theories and the 
role of perceived properties in the assignment of objects to a specific category 
on the basis of context. 



1 Introduction 

We are interested in the role of context in the identification and description of sensory 
properties related to olfaction. In particular, we focus on the olfactory description by 
naive subjects of fragrances in various environmental conditions: isolated or 
simultaneous presentation of samples. 

The study of olfaction presents special difficulties. The concept of sensory 
properties tends to be more complex than in the case of vision or hearing, particularly 
because olfactory properties are often associated with taste. It is often difficult to 
describe olfactory properties because of the many processes involved in the cognitive 
integration of sensory information, including physical stimulation, sensory 
verbalization and perception. Traditionally, there are said to be five sensory 
modalities: hearing, touch, vision, taste and smell (this classification originated with 
Aristotle). Sensory properties are associated with a specific sensory modality; they are 
classified as a function of the modality to which they belong. For example, 
transparency and translucence are visual properties of objects and are unequivocally 
associated with vision. Visual information lends itself to the classification of 
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properties by modality: the appearance of objects is easy to describe (brightness, 
color, shape, etc.) and psychophysics effectively describes the relationship between 
physical stimulation and perception [1]. The same holds true for auditory information. 
There is even greater ambiguity over the classification of properties by sensory 
modality with respect to the chemical senses - taste and olfaction. When subjects are 
asked to give examples of olfactory perceptions, the only ones that come readily to 
mind are flowers and fruits, descriptions related to other perceptions as mild, unripe 
and sweet, and hedonic descriptors [2], which, moreover, reflect great inter-individual 
variability [3]. 

What is the reason for this difficulty in describing olfactory stimulation? 

Olfactory perception thresholds are highly variable from one individual to another 
[4, 5], and there are many different types of anosmia: many people are completely 
insensitive to certain molecules. But researchers control for these perceptual believes 
that the encoding of olfactory information results from a personally significant event. 
Many authors [6, 7] support the idea of a special olfactory memory, different from 
other sensory modalities because of its very strong association with personal 
experience, which endows it with an individual character. However, the experiments 
of Chu and Downes [8] show that deeply rooted, odor-related personal memories 
mainly result from childhood experiences during the ages of six to 10. Whatever the 
case may be, odors are difficult to describe. Berglund et al. [2] conducted similarity 
tests between odor samples and demonstrated that odors could be classified along a 
hedonic dimension (bad versus pleasant), but the samples could not be characterized 
by any other descriptive dimension. The same odor sample presented to several 
subjects generated a wide variety of descriptors. These observations tend to refute the 
existence of a “true label,’’ which designates a specific term associated with a specific 
odor. Elmers [9] proposes the existence of an “inner nose’’ that allows one to have 
internal olfactory images without necessarily possessing a specific vocabulary. In this 
way, he explains the deficiency of our olfactory language. 

Cognitive theories of categorization that integrate the effects of context could 
shine new light on the problem of olfactory description. But what do we mean by the 
“effects of context’’? 



1.1 Effects of Context 

One can point to many different contextual effects: 

One such effect results from the interactions among sensory modalities, including 
the halo effect, which involves the influence of one sensory perception on another 
even though they are physiologically different. Confusions of this type between color 
and odor and between taste and odor are well known. For example, a green drink will 
be described as having a mint smell, vanilla milk will be described as sweet [10] and a 
sucrose solution will be judged sweeter if a pineapple, strawberry or caramel aroma is 
added. Degel and Koster’s experiment [11] illustrates this effect. Subjects were asked 
to match an odor (presented in a bottle) with a visual context. The pictures contained 
an image associated with odor (for example, a cup of coffee on a table and leather 
jackets in a department store). The results demonstrate the interaction between odor 
and vision, an interaction that involves implicit memory. 
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A second effect concerns the effects of mental context that clearly emerge when 
the test instructions are changed and the same test is conducted over different sessions 
(task repetition). Indeed, certain tasks seem to produce large intra-individual 
variations. Barsalou [12] shows that during a task in which subjects were asked to 
verbalize olfactory properties, only one-third of descriptors were common to the two 
participants and the descriptors given by the same subject varied according to the 
point of view the subject was asked to take (his/her own or the point of view thought 
to be held by a third person). These results also show that, for the same subject, just 
over half the descriptors overlapped between the two sessions. This strong variability 
has been confirmed by other authors [13, 14]. 

The mental context can also lead the subjects to favor either specific or general 
responses. For example, in response to the instruction, “describe the properties of this 
object,” the participants can answer, “it’s a fruit, it’s an apple, it’s a Golden Delicious, 
it has a skin, etc.” This inter-individual difference in describing an object’s properties 
can be explained by pragmatic factors [15], e.g. the subjects use conversational rules 
that allow them to be as informative as possible. They express themselves at a specific 
level so that properties can be used to distinguish one category from another. That is 
why subjects favor specific answers in response to a general question about an 
object’s characteristics, while the general characteristics are considered much less 
informative, based on the rationale of providing information that can clearly 
differentiate objects. 

A third effect of context concerns the effects resulting from the range of stimuli 
presented to the subject. Among these effects are simple contrast (objects that seem 
strong when compared to other objects with low intensity and vice versa); range 
mapping (subjects match the range of objects to the available notation scale); 
frequency bias (subjects tend to use all response categories possible the same number 
of times); contraction bias (subjects tend to match the center of the scale to the mid- 
range stimulus); centering bias (the responses tend to gravitate around the center of 
the scale); and, finally, transfer effects (experiences from the previous sessions affect 
the responses of the session in progress). These effects mainly involve the use of 
scales of intensity. With respect to odors, Lawless and Heymann [10] demonstrate, for 
example, that variations in sensory quality are caused by simple effects of contrast: 
the same odor (dihydromyrcenol) is judged to be more woody in a lemony context 
than in a woody context. The contexts were thus modified by the presence of other 
molecules (more woody or more lemony). 

We decided to concentrate on the last of these three contextual effects, e.g. the 
influence of objects present at the same time. We will examine the effect of this 
context, which we call “environmental context,” on the verbalization of specific or 
general properties and not on their classification. 



1.2 Context and V erbalization of Properties 

The description of sensory properties can be studied in the context of cognitive 
theories of categorization. Categorization can be defined as the perception of 
similarities and differences among objects in the perceptual scene, using categories 
stored in memory that define the properties on which perceived similarities and 
differences are based. Categorization is one of the essential characteristics of human 
cognition. It allows individuals to order the environment according to classifications 
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that help them deal with non-identical stimuli hy using the most relevant properties in 
the specific context. Categorization therefore seems to be a very important 
mechanism that plays a role in all our daily activities, from the perception of a 
stimulus to its behavioral response. It is essential for explaining the functioning of 
semantic memory, cognitive development and, in general, the major cognitive 
activities, such as planning, memorization and perception [16, 17]. 

The various theories of categorization allow us to underscore the importance of 
object properties in cognitive processes. Some researchers assign a dominant role to 
objects in the construction and use of categories [16]. In this case, properties are 
associated with categories. The features that are characteristic of objects belonging to 
a basic category, e.g. “table, ” are easy to describe [18] and convey more information 
than the features characteristic of superordinate categories, e.g. “furniture” or 
subordinate categories, e..g.“coffee table" [19]. Other authors suggest that objects are 
organized according to the properties they share [20]. In such a semantic network, the 
properties that describe objects are organized in such a way that certain properties are 
more generic while others are more specific. However, we believe that their 
description varies according to the context. Consider the following example: let’s say 
that the letter ‘a’ is the object studied. Several properties can be used to categorize it: 
a character, a letter, a vowel, the first letter of the alphabet, a lower-case letter, etc. If 
we place this object in different contexts and subjects are asked to underline the 
object in question, the instructions will have to be precise to avoid any ambiguity. In 
the following examples, it will be necessary to ask subjects to underline, respectively, 
the letter [385a]; the lower-case letter [AaAA]; and the vowel [cat]. Thus the object is 
the same but the distinctive property varies depending on the context. 

In tasks involving the generation of object characteristics, effects of context 
related to the level of specificity of properties can be expected. When the 
environmental context involves two or more categories, we assume that a larger 
number of properties will be generated, including not only distinctive properties but 
also shared properties. Vrignaud [14] compared the number of common properties 
expressed when categories were presented alone or in pairs. During the presentation 
of pairs, an average of 3.35 properties was obtained, compared to 0.78 properties 
when objects were presented in isolation. The author suggests that this effect is 
associated with the structural alignment effect described by Markman and Gentner 
[21]; common properties that share characteristics at a deep level can be aligned 
during a comparative task while more superficial characteristics are taken into 
account during a simple evaluative task. 

The purpose of our experiment is to study the effects of environmental context on 
the olfactory description of 12 perfumes. The effect of environmental context is 
observed by comparing olfactory descriptors generated in the conditions, “isolated 
object (S = solo)” and “set of objects (T = trio).” The dependent variables are the 
number of properties, their level of specificity and the number of unique properties. 

We have formulated the following three hypotheses: 

1. Specific properties are dominant when an isolated stimulus is presented and 
general properties are dominant when several objects are presented. The subject tends 
to express the general properties of objects during a comparative presentation, while 
the presentation of a single object favors the expression of more specific properties. 
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2. When specific properties are dominant (when an isolated stimulus is presented), 
individual differences are greater: the mental context of the subject plays a major role 
since the environmental context presents few constraints. 

3. A larger number of properties is generated when the objects are presented 
simultaneously. We assume that the simultaneous presentation of several objects 
encourages the subject to verbalize the general properties shared by the different 
objects and the specific properties of each object. 



2 Materials and Methodology 

2.1 Participants 

Sixty participants (male and female, university students) volunteered for the 
experiment. 



2.2 Materials 

Twelve samples were chosen to represent a floral fragrance universe. The fragrances 
were flower essences: patchouli, a thousand flowers, jasmine, geranium, ylang ylang, 
chamomile, lavender, lilac, rose and three flower blends. The concentration of the 
ethanol solutions was determined in order to assure equivalent perceived intensities. 
The solutions were applied to strips of paper and once the alcohol had evaporated, 
presented to the subjects. 



2.3 Protocol 

In half the cases, the subjects received three samples at the same time (condition T). 
In the other half of the cases, the subjects received the samples one after the other 
(condition S). In condition T, the subject described the samples by comparing them to 
each other. In condition S, the subject described the samples one after the other. 
Verbalizations were noted in full. 



2.4 Data Analysis 

For each perfume and each subject, we noted the total number of properties generated 
per condition (in order to study the effect of context on the generation of properties) 
as well as the number of unique descriptors per condition (in order to study the 
variability among subjects according to the context). Finally, a level of specificity was 
assigned to each descriptor. To do so, we relied on a classification developed during 
previous studies [22]. This olfactory classification is based on a categorization 
scheme applied to the organization of olfactory descriptors [23, 24]. The classification 
is in the form of a tree diagram: general properties are the properties found at the top 
of the tree, such as “food" (level 1). Specific properties are the properties found on 
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the farthest branches of the tree, like “lemon” (level 5). Thus, for each property 
communicated by subjects during the experimental task, a level of specificity- 
generality is assigned. Level 1 corresponds to the most general categories, while level 
5 corresponds to the most specific. Table 1 provides examples of properties 
corresponding to the five levels. 



Table 1. Examples of properties for each level of specificity 



1 


2 


3 


4 


5 


Civilization 


tfouse 


Homecare product 


Detergent 


Washing dish liquid 


Nature 


Vegetal 


Forest 


Wood 


Pine 


Food and drinks 


Fruits / vegetable 


Fruit 


Citms 


Lemon 


Perception 


Touch 


Thermal 


Hot 


Burning 


Feeling 


Unpleasant 


Sickening 


Emetic 


Extremely emetic 


Value judgment 


1 like 


1 prefer 


One of the best 


The best 



The percentage of properties in each of the five levels of specificity was calculated for 
each of the experimental conditions (conditions S and T) in order to study the effect 
of context on the specificity of descriptors produced in either an isolated situation or a 
comparative situation. 



3 Results 

We will preset, respectively, the results concerning the effect of context on the level 
of specificity of properties, the number of properties and the number of properties 
verbalized by a single subject (unique properties). 



3.1 Effect of Environmental Context on the Level of Specificity 

The percentages of properties by category (from 1 = general to 5 = specific) are 
presented in Table 2 according to the method of presentation of objects (S = solo, T = 
trio). 

Table 2. Percentage of properties from each level of specificity according to context condition 
(S: single sample presentation, T: three samples presentation). The sum of proportions for 
levels 2 + 3 and 4 + 5 are indicated in brackets (2). 



Level 




General 


Specific 


Not coded 


Total 




1 


2 3 


4 5 






Single sample (S) 


0.40 


6.40 16.00 

(2=22.40) 


33.20 38.80 

(2=72.00) 


5.20 


100 


Three samples (T) 


1.44 


16.91 20.5 

(2=37.41) 


29.86 26.62 

(2=56.48) 


4.68 


100 
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The results are as follows: 

- Few properties belong to the most general level (level 1). The most general 
categories of the classification are: civilization, nature, food, physical state, 
perception, feeling, value judgments. There are few olfactory descriptors that 
correspond to such a high level of generality. A chi-square test conducted on the two 
distributions based on the five levels resulted in a significant difference (p = 6.63 10'^) 
between the percentages by level of specificity of the two conditions. 

- There is a larger proportion of properties in levels 2 and 3 in the comparative 
condition: 37% of general properties when the odors are presented in groups of three 
and 22% when the odors are presented alone. Conversely, a larger proportion of 
specific properties is generated when the odors are presented in an isolated manner 
(72%) than in a comparative context (56%). 

These results confirm our main hypothesis about the effect of context, e.g. that 
more general properties are generated when several objects are presented together. 
Conversely, more specific properties are generated when the objects are presented 
alone. These are the object’s distinctive properties at the level of the basic category to 
which it belongs. 

3.2 Effect of Environmental Context on the Total Number of Properties 

The 60 subjects generated a total of 652 descriptors for the 12 aromas in the isolated 
context and the comparative context. 

The descriptors can be broken down in the following manner based on the two 
contexts (see Table 3): 



Table 3. Total number of generated properties according to context conditions (S: single 
sample presentation, T: three samples presentation). 



Conditions 


S 


T 


Number of subjects (N) 


39 


25 


Total number of properties (P) 


374 


278 


Average number of properties by subject (P/N) 


9.59 


11.12 



In the isolated condition, 39 subjects completed the experiment and generated a 
total of 374 descriptors, which comes to an average number of 9.59 descriptors per 
subject. This number rose to 11.12 descriptors per subject when the samples were 
presented in groups of three. A Student’s f test conducted on the two conditions shows 
that the probability of erroneously rejecting the null hypothesis is 0.1028. 

In line with our hypothesis, the results indicate that a larger number of properties 
were generated in the comparative context. 



3.3 Effect of Environmental Context on the Number of Unique Properties 

Out of 379 properties generated in the isolated condition, 227 were unique, e.g. they 
were used only once, by one subject and for one odor. That number represents 60.7% 
of all descriptors. In the simultaneous presentation condition, this percentage dropped 
to 42.1% (Table 4). 




Effects of Context on the Description of Olfactory Properties 



161 



Table 4. Number of unique properties according to 
presentation, T: three samples presentation). 


context 


conditions (S: single sample 


Conditions 


S 


T 


Number of subjects (N) 


39 


25 


Number of unique properties (UP) 


227 


117 


Percentage of unique properties (UP/P) 


60.7% 


42.1% 



A Student t test comparing the two conditions indicates that the probability of 
incorrectly rejecting the null hypothesis is 0.0382. Therefore, there were significantly 
more unique properties generated in the isolated context. These results confirm that 
the variability between subjects is greater when the stimuli are presented alone 
(condition S) than when they are presented together (condition T). Inter-individual 
variability is therefore larger when the property is described at a more specific level. 

Moreover, there is less variability in the condition involving the simultaneous 
presentation of odors. In the comparative condition, the effect of context is more 
pronounced: one might conclude that the objects presented generate a complex level 
of properties that include the specific properties of the objects as well as properties 
common to all or part of the objects. Conversely, in the isolated condition, the 
network of properties generated is simpler, conditioned by the single object. The 
subject’s mental context therefore carries more weight and is probably the source of 
the greater variability in the isolated context. 

These results therefore confirm the hypothesis of a greater contextual variability 
when the object is presented by itself. 



4 Conclusions 

This study of the number of properties, the level of specificity of properties and the 
number of unique properties generated in conditions of isolated context (single object) 
and comparative context (three objects) confirmed our hypotheses concerning the 
effects of context on the olfactory terms used to describe 12 perfumes. 

The results demonstrate the effects of environmental context on the verbalization 
of properties generated by the objects presented: 

- The subjects generate more olfactory properties when several odors are presented 
together: the distinctive properties of each object within each category, but also the 
distinctive properties of the objects in comparison with each other. 

- The number of unique properties is larger when an odor is presented by itself: one 
observes greater variability due to the expression of each subject’s mental context, 
which is less influenced by a restrictive environmental context as opposed to the 
situation in which several objects are presented at the same time. 

- The subjects tend to generate more specific properties than general properties in 
the isolated condition. Conversely, when several odors are presented simultaneously, 
the subjects tend to verbalize more general than specific properties. 

We can therefore conclude from this series of results that the study of context can 
contribute to a better understanding of the difficulty in describing odors. However, 
categorization, verbalization of property and context are linked. Like Poitrenaud [25] 
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and Richard and Tijus [26] we believe that the variability of categorization results 
from the fact that the properties perceived in the objects differ according to the 
context, which is comprised of the other objects presented simultaneously in the 
immediate environment. For example, in the context of a meal, the word “tomato” 
evokes the category, “vegetable”: one perceives that this is something eaten as an 
appetizer, entree or side dish. In an agricultural or horticultural context, however, 
tomatoes are perceived as fruits and the properties attributed to them are that they 
grow on plants, they come from flowers and they’re picked during a specific season. 
These are not the same properties that are perceived in a stimulus situation in which 
other stimuli are presented at the same time. The reason is that there is a reciprocal 
generation of properties among the objects comprising the context, so that certain 
properties stand out while others remain hidden. More specifically, the properties 
selected are 1) properties shared by all the objects, which allows subjects to assign 
them to a category and 2) the properties that distinguish these objects and which allow 
subjects to determine sub-categories. 

We have examined the effects of context with respect to the environmental context 
comprised of one or three objects. Other contextual situations should be studied in 
order to develop a broader understanding of the influence of environmental context. 
For example, the influence of the number of objects present on the generation of 
sensory properties associated with other sensory modalities should be investigated. It 
is also necessary to examine other types of context to better understand the effects of 
context on sensory descriptors - particularly the mental context and the type of task, 
which could be varied by, for example, changing the types of objects present while 
maintaining an equivalent olfactory context (odor on a strip of cloth or in bottles, 
natural objects, etc.). For a broader point of view, experiments could be completed 
with sensory professionals such as trained panels, oenophile, perfume creators, in 
order to relate natural categorisation with expert categorisation of odors. 



References 

1. Bagot, J.D. (ed.): Information, sensation et perception. Armand Colin, Paris (1996) 

2. Berglund, B., Berglund, U., Engen, T., Ekman, G.: Multidimensional scaling analysis 
oftwenty-one odors. Scandivian Journal of Psychology, 14 (1973) 131-137 

3. Holley, A.: Eloge de I'odorat. Odile Jacob, Paris (1999) 

4. Stevens, D.A., O'Connell, R.J.O.: Individual thresholds and quality reports of human 
subjects to various odors. Chemical Senses, 16 (1991) 57-67 

5. Baird, J.C., Berglund, B., Olsson, M.J.: Magnitude estimation of perceived odor intensity: 
empirical and theoritical properties. Journal of Experimental Psychology - Human 
Perception and Performance, 22, 1 (1996) 244-255 

6. Aggleton, J.P., Waskett, L.: The ability of odours to serve as state-dependent cues for real- 
world memories: can Viking smells aid the recall of Viking experiences? British Journal 
of Psychology 90 (1999) 1-7 

7. Wrzesniewski, A., McCauley, C., Rozin, P.: Odor and affect: individual differences in the 
impact of odor on liking for places, things and people. Chemical Senses. 24 (1999) 713- 
721 

8. Chu, S., Downes, J.J.: Long live Proust: the odour-cued autobiographical memory bump. 
Cognition. 75 (2000) 41-50 

9. Elmers, D.G.: Is there an inner nose? Chemical Senses 23 (1998) 443-445 




Effects of Context on the Description of Olfactory Properties 



163 



10. Lawless, H.T., Heymann H.: Sensory evaluation of food: principles and practices. 
Chapman & Hall, Aspen, Maryland. (1999) 301-340 

11. Degel, J., Koster, E.P.: Implicit memory for odors: a possible method for observation. 
Perceptual and Motor Skills, 86 (1998) 943-952 

12. Barsalou, L.W.: Intraconcept similarity and its implications for interconcept similarity. In: 
Vosniadou, S. & Ortony, A. (eds.): Similarity and Analogical Reasoning. Cambridge 
University Press, New York. (1989) 76-121 

13. Belleza, F.S.: Reliability of retrieval from semantic memory: common categories. Bulletin 
of the Psychonomic Society. 22 (1984) 324-326. 

14. Vrignaud, P.: Approche differentielle de la typicalite. Unpublished doctoral dissertation, 
Universite de Paris V (1999) 

15. Sperber, D., Wilson, D.: La pertinence. Communication et cognition. Editions de Minuit, 
Paris (1986) 

16. Rosch, E., Mervis, C.B.: Family resemblances: studies in the internal structure of 
categories. Cognitive Psychology. 7 (1975) 573-605 

17. Urdapilleta, L, Nicklaus, S., Tijus, C.: Sensory evaluation based on verbal judgments. 
Journal of Sensory Studies, 14 (1999) 79-95 

18. Rosch, E. (ed.): Principles of categorization in cognition and categorization. Laurence 
Erlaum Associated Publishers Hillsdale NJ (1978) 

19. Zacks, J., Tversky, B.: Event structure in perception and conception. Psychological 
Bulletin, 127 (2001) 3-21 

20. Collins, A.M., Quillian, M.R.: Retrieval time from semantic memory. Journal of Verbal 
Learning and Verbal Behaviour, 8 (1969) 240-247. 

21. Markman, A.B., Centner, D.: Splitting differences: a structural alignment view of 
similarity. Journal of Memory and Language, 32 (1993) 517-535 

22. Giboreau, A., Urdapilleta, L, Richard, J.F.: Naming olfactory properties: designing an 
odor description space, (submitted) 

23. Schleidt, M., Neumann, P., Morishita, H.: Pleasure and disgust, memories and 
associations of pleasant and unpleasant odors in Germany and Japan. Chemical Senses, 13 
2 (1988) 279-283 

24. Dubois, D.: Categories as acts of meaning: the case of categories in olfaction and audition. 
Cognitive Science Quaterly, 1 (2000) 35-68 

25. Poitrenaud, S.: La representation des procedures chez I'operateur: description et mise en 
oeuvre des savoir-faire. Unpublished doctoral dissertation. University Paris VIII, Paris 
(1998) 

26. Richard, J.F., Tijus, C.A.: Modelling the affordances of objects in problem solving. In 
Quelhas C. & Perera F (eds.): Cognition and Context. Special Issue of Analyse 
Psychologica, (1998) 293-315 




Varieties of Contexts 



R. Guha^ and John McCarthy^ 

^ IBM Research, San Jose, USA 
^ Stanford University, Stanford, USA 



Abstract. We believe that a deeper understanding of the uses of con- 
texts, in terms of its impact on knowledge representation structures, as 
reflected by a corpus of examples, is vital to the programme of formaliz- 
ing contexts in Artificial Intelligence. In this paper, we examine a num- 
ber of examples from the literature from the perspective of identifying 
general usage patterns. We identify four important varieties of contexts 
— Projection Contexts, Approximation Contexts, Ambiguity Contexts 
and Mental State Contexts. We define each type, describe sub-types, list 
benchmark examples of each sub- type, discuss their practical uses and 
the requirements they make of the underlying logic. We pay particular 
attention to the problem of lifting, i.e., of using information obtained 
from one context in another and describe how these different varieties of 
contexts tend to require different kinds of lifting rules. 



1 Introduction 

Mathematics has developed and used logic to express the “eternal truths” m 
such as Peano’s axioms, from which the rest of mathematics follows. For that 
programme, it is essential that the meaning of these axioms not depend on 
anything other than the logic in which they are stated. In particular, the cir- 
cumstances of their statement should not impact their meaning. Consequently, 
traditional logic has expressly avoided contextuality. Only recently have logi- 
cians and philosophers started developing logics which explicitly account for 
situations. 

In contrast, human communication exploits the situation or context of the 
communication, often to an extreme degree, leaving much implicit. Processing 
on these communications also exploits context, i.e., we don’t completely decon- 
textualize what we hear into a global frame of reference before we reason with it. 
In fact we argue that a complete decontextualization is not just undesirable, but 
impossible. We do however have a deep understanding of the role of the situation 
on the meaning of an utterance. Factoring this in plays an important role when 
we use information obtained from one situation in a different situation. 

We believe that logical formulae and other knowledge representations used 
by AI programs are more akin to human communication than to Peano’s axioms. 
Therefore, understanding and coping with the effects of context on representation 
structures is is important for AI programs. 

Since their first introduction into the logical AI programme in [Q], a substan- 
tial amount of work has gone into formalizing contexts. A number of different 
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logics ([Q, [0, [ r7| . 0) have been proposed to deal with some (but not all!) of 
the motivating examples from m, 0, [eB 

Unfortunately, the different proposed extensions to traditional logic (both 
propositional and quantified) have very different forms. 0] and 0 interpret 
contexts as theories with the ist predicate corresponding to validity. m and 
[0 provide a modal interpretation for contexts. jHI provides a different kind of 
semantics based on the concepts of locality and compatibility. Consequently, it 
is very difficult to compare and evaluate these different approaches in terms of 
their appropriateness for representing knowledge. A complicating factor is that 
while most of the examples require a quantified logic of contexts, most of the 
proposed new logicsHonly deal with the propositional case. 

We believe that a deeper understanding of the phenomenon, in terms of 
knowledge representation structures, ^ reflected by a corpus of examples, is 
vital to making progress. In this papeiO, we examine a number of examples from 
the literature from the perspective of identifying general usage patterns. We 
identify four important varieties of contexts. Three of the varieties of context 
we present — Projection Contexts, Approximation Contexts and Mental State 
Contexts are loosely correlated with the three kinds of contexts identified by 
Beneceretti, Bouquet and Ghidini p]. In this paper, we go one step further and 
give prec^ definitions of each of these types, describe sub-types, list benchmark 
examplesaof each sub-type, discuss their practical uses and the requirements 
they make of the underlying logic. We pay particular attention to the problem 
of lifting, i.e., of using information obtained from one context in another. We 
describe how these different varieties of contexts tend to require different kinds 
of lifting rules. We also identify an important fourth variety of contexts, namely. 
Ambiguity Contexts. 

2 Lifting: A Framework for Categorizing Contexts 

A computer system, especially an AI system, will work with several, perhaps 
many, contexts. The relations between sentences in different contexts are speci- 
fied by lifting relations, and so are the relations between the values of terms in 
different contexts. 

For example, we may have two contexts A3 and A5 specializing time to 3pm 
and 5pm respectively. Some sentences will be true in A5 if and only if they 

^ Some researchers m have argued that the phenomena of contexts is not one, but 
several different unrelated phenomenon. We believe this is a reaction to the lack of 
a good technical characterization of the problems that are being addressed by the 
introduction of contexts. 

^ It is not clear that we need an extension to traditional first order logic to handle 
contexts. In fact, it be very desirable if we did not require an extension. 

3 PI], an earlier paper with the same title explores varieties of logics of contexts. In 
contrast, our focus is on varieties of uses of contexts. 

Due to space constraints, we are only able to give brief descriptions of the examples 
in this paper. An extended version of this paper, available on the web, will contain 
the details of the examples. 
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are true in ^3. Other sentences have more complicated relations. Thus if both 
contexts involve driving from San Francisco to Los Angeles it may be that the 
distance to Los Angeles is 140 miles less in A5 than in A3. 

In traditional symbolic AI systems (ranging from deployed expert systems 
to systems such as Advice Taker) there is a single uniform database containing 
the programs knowledge. It is uniform in the sense that all sentences have the 
same contextualization. This enables the program to freely combine and use its 
information, i.e., if it has (f> and ^ ^ it can combine the two to conclude f3. 

As we argued earlier, representations used by AI programs are similar to 
human communications in that they have context dependence. An AI program 
that has knowledge about a broad range of topics or takes inputs from a variety 
of sources has to cope with different subsets of its knowledge having different 
contextual dependencies. The situation/context in which a formula (or other 
representation) is given to the program makes assumptions and simplifications 
which in turn affects the statement of the formula. We expect the program to 
receive chunks of knowledge of varying sizes from different sources. Some will be 
from interactions akin to discourses where the contextual dependencies pertain to 
the situation and topic of the discourse. Some will be in the form of task-specific 
knowledge bases such expert systems where the dependencies will pertain both 
to the fragment of the world pertinent to the system and task being performed. 
As illustrated by these two examples, context dependencies come in a wide range 
of styles and shapes. The different kinds of assumptions and simplifications and 
the ways in which they affect formulae give us the different kinds of contexts. 

The program will contain a number of databases, each corresponding to for- 
mulae with a different set of contextual dependencies, each pertaining to a differ- 
ent fragment of the world. It uses a set of terms denoting contexts to keep track 
of and deal with these dependencies. Each context (c) corresponds to a database 
(Ac) that pertains to a fragment (Dc) of the overall universe of discourse (D) 
that the program has knowledge about. Properties of this fragment may allow 
the program to make assumptions and simplifications that are not warranted by 
the larger D. These assumptions and simplifications are refiected in Ac- 

If the statements </> and (f> ^ (3 have different contextual dependencies, the 
program can’t always combine them to conclude (3. Before combining two sen- 
tences with different contextual dependencies, the program need to reconcile 
relative contextual dependencies. This relative decontexualization is done using 
a set of axioms we call Lifting Formulae^ 

We believe that the main problem of contexts for an advice-taking AI pro- 
gram is that of coping with contextuality of the advice given to it. To cope with 
the contextuality, it needs to be able to factor out the relative contextuality so 
that it can use knowledge gathered in one context in another. To put it an- 
other way, the central issue in modeling contexts is that of lifting. We therefore 
look at the problem of determining varieties of contexts from the perspective of 

® The name “lifting formula” came by analogy with topology. When there is a many- 
one map f from one space A to another B, some facts about B can be lifted to A. 
The analogy has not paid off so far. 
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lifting. We determine the varieties of contexts based on their assumptions and 
simplifications and the effects of these on lifting formulae into and out of them. 

A brute-force approach of writing a separate lifting rule for every pair of 
contexts, for every relation will require a number of axioms that makes the Frame 
Problem look minor in comparison. We need a combination of good defaults, and 
very general lifting rules to solve this problem. 

Just as the Frame Default m captures the intuition that most events don’t 
affect most ffuents, we need a similar default that most contextual factors don’t 
affect the representation of most facts. This is the default that allows us to 
assume that most of what we know applies even in completely new and strange 
situations. An early attempt at capturing this intuition is described in [3], but 
clearly much more work is required. 

For writing general lifting rules, we need a good understanding of what each 
(type of) lifting rule is doing. This is our real goal in this paper. Our goal in 
categorizing contexts is not just to organize them, but to help develop patterns 
of general and widely applicable lifting rules. In the next four sections we present 
the four varieties of contexts. 

3 Projection Contexts 

Sometimes, we can make assumptions about certain objects that occur often 
in the sentences in the database corresponding to the context c. These as- 
sumptions allow us to simplify Ac by dropping portions of sentences pertaining 
to these assumptions. In the extreme, if the assumptions are strong enough, we 
can start dropping parameters to functions/predicates corresponding to these 
objects. The resulting simplified Ac can be seen as a projection of a more gen- 
eral A which makes these assumptions explicit. The actual projection operator 
depends on the structural form of the assumption. We now examine some of the 
popular examples in the literature that correspond to Projection Contexts. 



Ex. 1: Normalcy/Kindness Assumptions 

The use of contexts for making Kindness Assumptions cni in planning and for 
making Normalcy Assumptions in Cyc’s Microtheories fall into this category. 
In such uses, we assume “normal” conditions, e.g., people are acting rationally, 
the physical location of the objects is on/near earth’s surface, etc. We then leave 
these assumptions unstated in the database fragment Ac. These conditions might 
be made explicit only when the system tries to lift statements out of Ac- These 
assumptions might allow us to simplify the vocabulary of Ac to a point where the 
assumptions can no longer be expressed in Ac. P) shows how a fully qualified 
axiom such as (using the syntax of [4J]) 

Co : (yx)haveTicket{x) AatAirport{x) Aclothed{x) Aconscious(x) . .. => canFly{x) (1) 
can be simplified in an appropriate TravelContext to 

TravelContext : {\/x)haveTicket{x) A atAirport{x) ^ canFlyix) (2) 
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In another example, Cyc has the following axiom, which pushes qualifi- 
cations such as the person acting rationally, the workplace not being a beach, 
there not being a fire, etc. onto the WorkplaceContext. 

W orkplaceC ontext : (yx)Person{x) A atW ork{x) => clothed{x) (3) 

Lifting into such contexts requires verifying that the normalcy assumptions 
are satisfied. Similarly, lifting out of such contexts requires qualifying the lifted 
axioms with normalcy assumptions. If the context Ci makes the assumptions 71, 
72, ..., and C2 does not, and C2 makes the assumptions / 3 i, /?2, ••. which c\ does 
not, the these lifting axioms have the form: 

Co : (yx)ist{ci, 4 >{x)) A 71 (x) A 72(5) A ... ^ ist(c2, 4 >{x)) (4) 

Co : {'dx)ist{c2, (j}{x)) A /3i(x) A P2{x) A ... ^ ist{ci, (l){x)) (5) 

Arguably, this use of contexts can often be replaced by the use of non- 
monotonic defaults. However, this use of contexts is not just for “normalcy” 
conditions. They are useful for capturing open-ended bundles of assumptions 
pertaining to classes of situations, including non-normal ones. For example, we 
might have a context corresponding to “War Time Conditions”, analogous to 
the above WorkplaceContext. Since there is nothing normal about war time, it 
is probably not reasonable to always assume the assumptions made by such a 
context. 

This example also illustrates how contexts can be very useful when combining 
knowledge built for different purposes. A knowledge base built for battlefield 
management might very reasonably assume that there is a war going on. On the 
other hand, a medical diagnosis system would probably not want to make that 
assumption. Contexts provide a useful mechanism encapsulating axioms based 
on their origin. 

This example uses contexts as a solution to the qualification problem. It 
pushes the qualification onto the context parameter which acts as a hook for 
stating the qualifications in an incremental fashion, without rewriting all the 
axioms in the context. 

Contexts are also required, for this kind of use, when the assumptions and 
simplifications cause a change in the language of A^,. Amarel’s j]p example of 
reformulating the missionaries and cannibals problem, in which the assumption 
that the identity of particular missionaries and cannibals is not relevant allows 
for a reformulation that significantly reduces the computational complexity, is 
an example of this. 



Ex. 2: Parameter Suppression 

When the assumption is strong enough, the objects (about which assumption is 
being made) may become irrelevant and we may be able to drop references to 
them from our predicates and functions. AboveTheory (H, pS]) is an example 
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of this where the situation parameter gets suppressed. If we can assume that the 
world is static, i.e., if the situation argument is the same for all statements of the 
form holds{4>, < situation >), we can simply drop the reference to the situation 
and write just cj). Contexts such as specSit(si) m obtained by focussing on a 
particular situation Si also fall into this category. Here are some examples of 
axioms in the AboveTheory: 

AboveTheory : (\/xy){on{x,y) => above{x,y)) (6) 

AboveTheory : {\/xyz){above{x, y) A above{y, z) ^ above(xz)) (7) 

Lifting involving contexts such as speeSit(si) is quite straightforward. We 
only lift into the context those axioms which have the appropriate value for that 
parameter. Similarly, when lifting out, we reinstate the appropriate value for the 
parameter. 

bloeks : {yxys){holds{on{x,y), s) ^ ist{specSit{s),on{x,y))) (8) 

blocks : {yxys){holds{above{x,y),s) ^ ist{specSit{s),above{x,y))) (9) 

If the parameter is not fixed to a particular value but is only assumed to be 
the same (whatever it is) across all axioms, as in the case of AboveTheory, 
when lifting out, reinstate all occurrences of the parameter with a universally 
quantified variable. 

Co : ist{AboveTheory, (V(af)<()(a:))) AA ist{blocks, {Wx, s)holds{(j}{x) , s)) (10) 

Lateral lifting, in which axioms are lifted without any modification from con- 
texts such as AboveTheory into contexts such as specSit(si) are accomplished 
with axioms like the following: 

Co : {ys){ist{AboveTheory, {\/{x)(f){x))) o ist{specSit{s), {y{x)(f){x)))) (11) 

As we mentioned in section 2, writing lifting axioms specific to particular 
contexts such as AboveTheory will not scale. We need more general axioms such 
as the following, which will work across all static theories. 



Co : (\/ sc)StaticTheory[c) A {ist{c, {y{x)(j){x))) ist{specSit{s), {y{x)(j){x)))) 

( 12 ) 

Contexts such as AboveTheory and speeSit(si) suppress the situation (i.e., 
temporal) parameter and correspond to static models of the world. Similarly, 
parameters corresponding to location can be suppressed to create spatially local 
models of the world. 
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Ex. 3: Database Partitioning/Segmentation 

Nayak [ ir7| describes SIGMA pT) . a knowledge-base of scientific domain knowl- 
edge that supports building executable domain models. SIGMA contains knowl- 
edge describing two very different application domains: modeling the atmosphere 
of one of Saturn’s moons (Titan) and modeling a forest ecosystem. By separating 
the axioms of these two very different domains into different contexts, reasoning 
can be focused on just the axioms in the domain. 

Gyc makes similar use of its Microtheory mechanism. The Naive Physics 
Microtheory, which contains a simple model of the physical world, has little in 
common with the US Legal microtheory which contains a simple model of the 
US legal process. By separating the axioms into different Microtheories, both 
knowledge entry and subsequent reasoning can be simplified. 

In addition to long lived contexts such as the Naive Physics Microtheory, 
Gyc also uses shorter lived. Problem Solving Contexts p| for focussing on a 
particular problem that it is trying to solve. When Gyc is given the description 
of a scenario (about which it will be asked questions), Gyc creates and uses 
a specialized Problem Solving Gontext (PSG) for that scenario. The scenario 
description, typically a set of ground facts, is entered into the PSG and general 
axioms are lifted into this PSG from Microtheories such as the Naive Physics 
Microtheory. Based on differences in normalcy assumptions, approximations, etc. 
between the PSG and Microtheory, the lifting might involve changing the axioms. 
The PSG serves to focus the inference engine on only the relevant objects. 

In this role, contexts act like a “package” mechanism, not unlike the package 
mechanism found in programming languages such as Java and Lisp, with the 
caveat that lifting into Problem Solving Gontexts is substantially more sophis- 
ticated and complex than importing lisp or java objects between packages. 

3.1 Discussion 

Projection Gontexts are probably the most widely used type of contexts in im- 
plemented KR systems. In fact, systems such as KRL had a form of contexts for 
this purpose long before the introduction of contexts into logical AI. 

The primary demand that Projection Gontexts make of the underlying logic 
comes when the simplifying assumptions are strong enough to warrant a change 
in the vocabulary. In such a case, we need the underlying logic to be much less 
restrictive than traditional first order logic (POL) about well-formedness, etc. 
For example, in traditional FOL, holds{s\, on{a, b)) A on{a, b) is typically not a 
well formed formula. But with the kind of vocabulary simplifications introduced 
by Projection Gontexts, we might very well require it in our database. Non- 
traditional variants of FOL such as those described by Hayes and Menzel m 
seem capable of providing this functionality. 

4 Approximation Contexts 

Often, the task for which a database is used permits us to use approximate 
models for representation and reasoning. The most well known approximate 
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model is Newtonian mechanics. Subfields of AI such as qualitative physics have 
extensively used approximations which make it feasible to model complex phe- 
nomenon. Approximations are widely used in the the common sense world too, 
e.g., we often approximate the price of object by ignoring tax, shipping, etc. 



Ex. 4: Attribute Approximation 

In the simple case, the value of a particular attribute (/) of an object is approx- 
imated. These approximations have the general form 

fix) = fiix) w fiix) + f 2 ix) + f 2 ix)... (13) 

To obtain the approximation context, we substitute occurrences of the right 
hand side with the left hand side. Similarly, when lifting out of such approxima- 
tion contexts, we add back the terms that were dropped out. 

The database example given in P], in which the Navy, Airforce and GE all 
have databases of prices, each a different approximation of the total price paid 
by the tax payer, is an example of Attribute Approximation. The price of an 
object in the GE database is approximated to not include spare parts, unlike the 
Navy database which includes spare parts, and the Airforce database includes 
both spare parts and inventory costs. Presumably, somewhere else is the true 
price of the object including discounts, shipping costs, etc. 

When lifting formulae from Approximation Gontexts to contexts which don’t 
make the assumption, we need to factor in the terms that were approximated 
out. When combining data in the GE database with data in the Navy database, 
we have to be careful about the approximations made by each. This is done with 
lifting formulae like the following: 

Co : {\/xy) ist{cnavy,price{x) = y) ^ 

ist{cGE, y = price{x) -t- price{spares{x))) ^ 
ist{cairforce,y = price{x) + price{spares{x)) +cost{inventory{x))) 



Ex. 5: Structural Approximation 

Attribute Approximations only affect a particular attribute (typically numeric) 
of some class of objects. More complex are structural approximations such as 
approximating a car as a cuboid, a somewhat curved road as a line and processes 
as being instantaneous. These map one set of objects (such as a car, a road 
or processes) to a corresponding, more easily modeled set of objects (such as a 
cuboid, line or instantaneous event). Structural approximations are very common 
when modeling the physical properties of objects. Objects get approximated into 
regular shapes that are characterizable by simple geometric formula, into shapes 
with lower dimensions, etc. Having done this, the Approximation Gontext may 
altogether dispense with the original object, often using the same symbol to 
denote the approximate model. 
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When lifting conclusions from contexts which make structural approxima- 
tions, it is important to not lift axioms pertaining to the approximation itself. 
So, for example, if for the purpose of calculating the distance to a certain galaxy 
we approximate earth to a point, it would be inappropriate to lift a conclusion 
that earth’s volume is zero. In fact, most structural approximations are targeted 
at computing a particular attribute(s) of the approximated object (e.g., distance 
to the galaxy) and these are the only attributes of the object can be lifted out. 



4.1 Discussion 

Approximate models are probably the most widely used case of people using dif- 
ferent explicit models of a phenomenon. This is especially the case in engineering 
and science. 

Projection Contexts and Approximation Contexts are similar in that they 
exploit assumptions to formulate simpler theories. However, there is an impor- 
tant distinction between these two classes of contexts. The assumptions made 
by Projection Contexts are typically consistent with the database A that the 
context Ac is projected from. In contrast, the assumptions made by Approx- 
imation Contexts are usually logically inconsistent with the context they are 
derived from. So, in addition to the requirement made by Projection Contexts, 
Approximation Contexts also impose the requirement that the system tolerate 
the inconsistency between the more accurate model and its approximation. In 
particular. Approximation Contexts need some form of referential opacity so 
that formulas such as the following are not invalid. 

ist{ci,volume{Earth) = 0) A {volume(Earth) = 1,097 x (14) 

5 Ambiguity Contexts 

Sometimes, the reference of a symbol might be unambiguous in a narrow scope or 
situation in which certain constraints may be assumed, but ambiguous in a larger 
scope without the aid of these additional, often implicit constraints. The goal of 
Ambiguity Contexts is to capture this scope so that statements containing the 
ambiguous references can be given to the program without full disambiguation. 
The narrowness of the scope can also be used to advantage to perform more 
efficient reasoning. The scope could be defined by the situation, by a discourse 
or by the problem solving goals of the program. 



Ex. 6: Indexicals 

Indexicals (such as he, she, it, now and here) are the best examples of the 
use of Ambiguity Contexts. 0, m and many others have shown how a logical 
formulas such as hungry {He, Now), which contains the unresolved indexicals 
He and Now can be added to a database in a limited Discourse Context. The 
advantage of doing this is that disambiguation can be postponed, while the 
reasoning engine can profitably derive conclusions from the statement. 
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The following examples, adapted from |0, illustrate the use of linguistic terms 
and predicates in a formal language. It is assumed that a natural language front 
end, using a lexicon and linguistic knowledge rephrases the natural language 
utterance as a formula. This formula might be heavily context dependent in a 
manner illustrated below. 

Pronouns and Indexicals such as He, She, It, Now, I, etc. are terms in the 
language. The sentence “he is hungry” translates to hungry(He), “it is now 
4pm” to {Now = 4pm), and so on. 

The language includes the functions The and A to handle definite and in- 
definite references. The function A is similar to the article A. The sentence “the 
lady owns a bag” would be translated into owns{The{Lady) , A{Bag)) . 

Constraints such as the following, together with an appropriate minimization 
of the predicate present (isi, which specifies whether a context includes a 
certain object in its domain) enable a program to use a wide range of knowledge 
and deduction techniques to determine the denotation of indexicals. 

(y Ciy)ist{ci{y = It)) ^ ist{ci,^Person{y)) f\present{ci,y) (15) 

Ex. 7: Homonymy 

Buvac 1^ describes the use of contexts to capture a different kind of ambiguity 
than that exhibited by indexicals. Consider the statement “He went to the bank”, 
where it is not clear whether the word ’bank’ denotes a financial bank or river 
bank. In typical natural language systems, this disambiguation would have to 
be done before the parse of the statement can be added to the database. Buvac 
shows how with contexts, the natural language front end can add the ambiguity 
preserving translation into the database. Buvac considers the statement “Vanja 
is at the bank”. The denotation of “bank” is ambiguous. The statement can be 
added to an appropriate Discourse Context Cdo as: 

Cdo ■ at{Vanja,The{Bank)) (16) 

Next we are told that he got money from the bank 

Cdo ■ gotMoney{Vanja,The{Bank)) (17) 

Based on common sense axioms such as the following, Buvac shows how the 
system can infer that the bank must be a financial bank and not a river bank. 

Co : {\/cxy)ist{c,gotMoney{x,y) => Financial Bank{y)) (18) 

Pi shows how the same approach can be used to treat prepositions such as at, 
to, for, etc. The introduction of “predicates” such as for allows us to translate 
the sentence “Fred bought the rose for Jane” as, 

(3e){Buying{e) A object{e, The(Rose)) A for{e, Jane)) (19) 

Similarly a variable function Etc can be used to represent ellipsis. The sentence 
“Fred likes ice cream, softees, etc.” would be translated as 

likes{Fred, Etc{IceCream, So f tee)) (20) 
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Ex. 8: Metonomy Sz Polysemy 

With homonymy, the different denotations of the word denote very different 
and unrelated things. With metonomy and polysemy, the different denotations 
are closely related. Consider the two sentences “flight UAL201 landed in San 
Francisco at 1.35pm” and “San Francisco elected John Smith to ...”. In the first 
sentence, ’San Francisco’ denotes San Francisco’s airport (which isn’t even in 
the city of San Francisco), and in the second, it denotes the electoral community 
of the city of San Francisco. There are many other such denotations of the term 
’San Francisco’ (the actual land mass, all the people, the executive branch of the 
city, the city’s economy, ... ). Common natural language usage typically does not 
use different terms for these different concepts. The usage of the term is usually 
adequate to distinguish between them. 

As with indexicals and homonymy, we can use contexts to preserve this kind 
of ambiguity as well. However, unlike indexicals and homonymy, in many cases, 
statements with metonomy /homonymy ambiguities can be lifted without resolv- 
ing the ambiguity. Indexicals and homonymy are purely linguistic phenomenon. 
Metonomy and polysemy are not just linguistic, but also epistemic phenomenon. 
Consider the following example. In a simple theory about wars, attacks, etc., we 
might not distinguish between a country, its government and its armed forces. 
So, we might have axioms such as the following, which says that before a country 
attacks another, the head of state of that country has to approve it. 

occurs{si, attacks{x, y)) ^ occur s{prior{si), approve{headO f State{x) , attacks{x, y))) 

(21) 

This model of the world, where we don’t distinguish between the different 
branches of the government etc. is adequate for a great many tasks. Now, con- 
sider a context describing a coup or mutiny in which one arm of the state fights 
another. Clearly, our simple representation breaks down. In particular, axioms 
like the one given above are clearly wrong. At this point, we would like to switch 
to the finer grained representation. 



5.1 Discussion 

Ambiguity Contexts enable the database to contain logical statements which still 
have indexicals and homonymous references in them. This provides a great deal 
of flexibility in when and how these references are disambiguated. In particular, 
it becomes easier to use domain knowledge and the logical inferencing apparatus 
for disambiguation. 

That said, this use of contexts can be replicated without resorting to con- 
texts by the introduction of new terms. For example every new reference to an 
indexical (such as he) could be mapped into a new term (such as he-3994), with 
the appropriate constraints added to he-3994. Alternately, one could introduce 
a term such as heiutterancei) which refers to the denotation of the word ’he’ in 
utterancci. Homonymy can be similarly treated. 

The use of contexts for metonomy and polysemy on the other hand is much 
more significant and powerful. As experience with Cyc m shows, broad, large 



Varieties of Contexts 



175 



scale knowledge bases, which cover many different aspects of a set of objects have 
to make many subtle distinctions. For example, Cyc distinguishes between the 
land mass associated with a city, its populace, different branches of its govern- 
ment, its head, and so on. While the ability to make these subtle distinctions is 
useful, it makes the task of knowledge entry substantially more difficult. Further, 
most of these distinctions are non-essential in most circumstances. Being forced 
to make them all the time complicates both knowledge entry and subsequent 
reasoning. Contexts provide a mechanism by which we can use the simplest 
formalism, i.e., the one that makes the fewest distinctions, most of the time, 
transcending to the more expressive representation only when we need to. 

The main requirement that Ambiguity Contexts impose on the underlying 
logic is that of referential opacity. In other words, the formula ist{ci, {He = 
John)) A ist{c2, {He = Jane)) A {Jane ^ John)) should not be invalid. 

6 Mental State Contexts 

Mental State Contexts correspond to the use of contexts to capture propositional 
attitudes and knowledge of other kinds of “alternate” states of affairs such as 
fiction. 

Unlike the previous four kinds of contexts, these contexts are not character- 
ized by what they contain, but in terms of their provenance. Consequently, there 
is little that we can say in general about lifting into/from them. 

Ex. 9: Fictional Contexts 

In McCarthy gives the example of using contexts to make statements that 
are true in the fictional context corresponding to Sherlock Holmes stories. Such 
a context could include statements like: 



We rarely, if ever, lift axioms out of fictional contexts. However, we may lift 
axioms from non-fictional contexts into with fictional contexts. So for example, 
even though the SherlockHolmesContext does not explicitly state that Rome is 
in Italy, we can lift this from a non-fictional context into the SherlockHolmesCon- 
text. 

Ex. 10: Perspectives, Counterfactuals, and Propositional Attitudes 

Contexts may be used to represent the world from the perspective of an agent. 
These non-fictional contexts are closely related to different kinds of propositional 
attitudes which have been widely studied in philosophy and AI. There is a rich 
body of examples from those fields. Contexts have been proposed in P31, 0 and 
elsewhere as a mechanism for handling perspectives and propositional attitudes 



SherlockHolmesContext : Detective{H olmes) 
SherlockHolmesContext : partner {Holmes, Watson) 



( 22 ) 

(23) 
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in general. Ghidini and Giunchiglia 0 provide an example of a “Magic Box”, 
which contains 6 sectors, each possibly containing a ball. Two agents, Mr. 1 and 
Mr. 2 each have different views of the box, based on their physical locations. 
They also consider the case of each agent having a partial view of the box. It is 
further possible to consider each agents view of the other agents view and so on. 

Gostello and McGarthy ^ treat counterfactuals using contexts. For example, 
consider the sentence ”If another car had come over the hill when you passed 
there would have been a head-on collision.” ”If another car had come over the 
hill when you passed” defines a counterfactual context. Note that the context is 
highly incomplete - it doesn’t say exactly when or what kind of car. 



6.1 Discussion 

Of the different varieties of contexts. Mental State Gontexts are probably the 
most demanding of the underlying logic. In addition to the requirements im- 
posed by the earlier categories, they also bring in the requirements imposed by 
propositional attitudes pj. It is indeed possible that this variety of contexts is 
a different phenomenon, best dealt with different machinery. 

7 Conclusion 

In this paper we looked at a number of examples of contexts and distilled them 
into four important categories, each of which has distinct properties in terms 
of lifting and each of which imposes different requirements on the underlying 
logical machinery. We hope that these categories and examples will be useful in 
comparing and evaluating different approaches to dealing with contexts. 

Acknowledgements. We would like to thank Valeria dePaiva for comments 
on a draft of this paper. 
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Abstract. Context-aware application plays an important role in the ubiquitous 
computing (ubiComp) environment by providing the user with comprehensive 
services even without any explicitly triggered command. In this paper, we 
propose a unified context-aware application model which is an essential part to 
develop various applications in the ubiquitous computing environment. The 
proposed model affirms the independence between sensor and application by 
using a unified context in the form of Who (user identity). What (object 
identity). Where (location). When (time). Why (user intention/emotion) and 
How (user gesture), called 5W1H. It also ensures that the application exploits a 
relatively accurate context to trigger personalized services. To show usefulness 
of the proposed model, we apply it to the sensors and applications in the 
ubiHome, a test bed for ubiComp-enabled home applications. According to the 
experimental results, without loss of generality, we believe it can be extended to 
various context-aware applications in daily life. 



1 Introduction 

Ubiquitous computing (ubiComp) allows users to get comprehensive services with 
ubiquitous computing resources in daily life [1][2]. The sensors and applications in 
ubiComp-enabled environment will be more intelligent with the development of 
related technologies, such as embedded networking, pervasive sensing, and intelligent 
processing. Such a smart environment potentially provides the personalized intelligent 
services without any explicit user’s commands in the near future. In order to achieve 
such intelligent services, the environment needs to obtain user-centered context 
information without distracting the users. 

Over the last few years, various research activities on context-aware applications 
have been reported. For example, ACE (Adaptive Control of Home Environment) is a 
system to control temperature and lighting conditions at home by training the daily 
life patterns of the residents using Neural Net [3]. Both EasyLiving [4] and 
AwareHome [5] have showed how context information can be used in the home 
environment. Meanwhile, MIM (Multimedia Interface Manager) showed how to 
recognize the user’s context through various modalities (i.e. seeing, hearing, 
touching) through camera, microphone, and haptic glove [6]. Note, however, that 
contexts used in those applications have different meanings and formats according to 
the chosen applications. 
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In this paper, we propose a unified context-aware application model that can be 
used in the ubiComp-enabled home environment [7]. The proposed model consists of 
two main blocks, i.e. ubiSensor and ubiService. The ubiSensor creates a preliminary 
context in the form of Who (user identity). What (objects identity). Where (location). 
When (time). Why (user intention/emotion), and How (user gesture), called 5W1H, 
by monitoring the user in the environment. The ubiService determines an integrated 
context by merging preliminary contexts from various ubiSensors and generates the 
final context that triggers a user-centered service. 

The proposed model has various advantages over conventional context models. For 
example, like a Context Toolkit [8] [9], it does not use any mediation for context. 
However, it maintains independence between sensors and applications by separating 
the role of Context Toolkit into ubiSensor and ubiService. Then, the ubiSensor 
generates a preliminary context instead of directly passing the sensed raw data to the 
Context Toolkit. The resulting context can be shared by all ubiServices and, thus, by 
all applications. As a result, the context reusability also can be guaranteed. The 
ubiService also ensures that the application exploits a relatively accurate context to 
trigger personalized services by feeding back the integrated context to ubiSensors. 

This paper is organized as follows: In Section 2, we explain basic terminologies 
used in this paper. In Section 3, we describe the proposed unified context-aware 
application model in the ubiComp-enabled home environment. The implementation 
and experimental results are explained in Section 4 and 5, respectively. Finally, the 
conclusion and future works are discussed in Section 6. 



2 Context for the UbiComp-Enabled Home 

Smart Home plays an important role as an application in UbiComp-enabled services. 
However, the present state of Smart Home focuses on home automation to control 
doors, lights, elevators, etc. automatically by device-controlling technology such as 
LONWORKS [10] [11], or home networking to connect various information 
appliances together. UbiComp-enabled Home shall support not only home automation 
and home networking but also personalized services based on context. To implement 
the UbiComp-enable Home, we have to overcome the restrictions from which existing 
context-aware application model suffers, especially dependence between sensor and 
application, and chaos of context definitions. 

Nowadays a sensor of UbiComp-enabled Home depends on its own services. 
Because of the dependence, developers of context-aware application suffer from 
adding/replacing/deleting a sensor(s) and from modifying many source codes. Also it 
is hard to reuse a sensor(s) in other applications. 

This dependence can be reduced by using smart sensor in UbiComp which has 
capability in sensing, processing, and networking. The sensor is indirectly connected 
to application through the networking and generates unified information for several 
applications through the processing. It is easy for a sensor to be added, deleted, or 
replaced by another and reused by other applications. This paper shows that smart 
sensor converts signals into high level context and transmits this context to 
application. Specifically, it changes sensed signals to context in forms of 5W1H by its 
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own processing and transmits the context to various applications through its own 
networking. Therefore, it may guarantee the independence of sensor as well as the 
reusahility for application. 

UbiComp-enabled Home requires well-defined context. However, in most of the 
applications reported, the context is not well defined. Previous context-aware 
applications mainly use ad-hoc definitions according to the selected applications. For 
example, Schilt et al. defines context as information about the user and object such as 
identity and location [12]. Dey et al. defines context as sensed information by the 
application such as identity, location, activity and state of people, groups and objects 
[13]. Note however that those definitions may be inconsistent, i.e. changing 
depending on the selected applications, since such definitions are only suitable for the 
specific applications. 

To solve the problem, we define 5W1H as a unified context so that it can be 
applicable to all applications in ubiComp environment [1][2]. In general, many 
context-aware applications retrieve information or trigger a service according to a part 
of 5W1H such as user identity, location, and time. One theory suggests a unified 
context, in the form of 5W1H, provides information enough to be used by several 
applications. Therefore, the unified context model exploiting 5W1H may work in 
most context-aware applications without loss of generality. 

It is necessary that applications of UbiComp-enable Home analyze context to 
support the user-centered service. To get precise context, we define different levels of 
context, i.e. preliminary, integrated and final context. The preliminary context 
generated from a sensor is not enough to trigger a proper service. In general, the 
extracted context from a sensor may not be accurate or even incomplete since a sensor 
may not generate all 5W1H. Thus, we introduce integrated context and, thus, final 
context. The integrated context is completely filled with 5W1H by merging 
preliminary contexts from a set of sensors. The final context is refined to trigger a 
user-centered service, which is a set of parameters to be used by a service function. 
As a result, an application developer may easily design context-aware applications by 
specifying the condition of the service trigger as a 5W1H. 



3 Ubi-UCAM: Unified Context- Aware Application Model 

The proposed ubi-UCAM, a unified context-aware application model in ubiquitous 
computing environment, consists of ubiSensor and ubiService, as shown in Fig. 1. 
The ubiSensors generate a set of preliminary context. Then the ubiSensors and the 
ubiServices exchange context through embedded networking modules. The 
ubiService yields the integrated context by merging the preliminary contexts from a 
set of ubiSensors and generates the final context by refining the integrated context 
with the current state of ubiService. Besides, ubiService multicasts the integrated 
context to ubiSensors, currently connected to ubiService, to help ubiSensor update the 
preliminary context. The final context is used to trigger the user-centered service. 
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Fig. 1. Ubi-UCAM Architecture 



3.1 UbiSensor 

The ubiSensor, as shown in Fig. 1, consists of both sensor and preliminary context 
decision modules. The sensor module monitors the activities of the user in the 
environment. Then, the context decision module creates the preliminary context in the 
form of 5W1H by analyzing the sensed signals. As shown in Table 1 the preliminary 
context is decided using the predefined ‘context library’ for a specific application. 
Note that both ‘how’ and ‘why’ components among 5W1H, corresponding to the 
gesture/action and intention/emotion of user, may require more complicated 
processing. However, to make the problem simple, all 5W1H is determined by the 
predetermined context library. Accordingly, the ubiSensor referring the same context 
library generates the same preliminary context. 

The resulting preliminary context can be represented in the message format, as 
shown in Fig. 2. It is more flexible to express preliminary context by using tab 
character to separate each element of 5W1H. The ‘-‘ character also presents empty 
element, which results from the fact that a sensor module cannot determine the whole 
5W1H at a time. 



Preliminary Context 
Integrated Context 


“who+\t+what+\t+where+\t+when+\t+how+\t+why+\0” 




If one of 5W1H is empty, it can be expressed as 







Fig. 2. Context Message Format 
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Table 1. Context Library 



Preliminay Context 


Definition 


Who 


User identity (User Name) 


What 


Real or virtual object related to user intention 


Where 


Location of the user or the object 


When 


Time (YearDateHourMinute) 


How 


User gestures or action 


Why 


User intention/emotion to control some services 



3.2 Ubiservice 

The ubiService, as shown in Fig. 1, consists of four main modules; context integrator 
(Cl), context manager (CM), interpreter (INT), and service provider (SP). Cl collects 
preliminary contexts for a given time ( AT’ ) from a set of ubiSensors connected to the 
ubiService, and decides the integrated context. As shown in Fig. 3, the preliminary 
contexts are aligned and elements of 5W1H in the same column are merged into the 
integrated context by voting. In case of ‘why’, we use simple linear mapping, which 
can be improved by adapting Neural Net. The resulting integrated context has a 
complete user-centered 5W1H and is forwarded to CM. Simultaneously, the 
integrated context is multicasted to all ubiSensors. 

CM compares the integrated context with all context conditions in a hash table to 
trigger SP, as shown in Fig. 4. If a context condition is matched, CM calls a function 
of SP that is associated with the context condition. Otherwise, CM discards the 
integrated context. The hash table manages context conditions as a key and 
information of function as a value. The table supports both 1:1 and N:1 relations 
between a key and a value and also guarantees fast search of integrated context in 
context conditions. After delivering the selected information and corresponding 
function to INT, CM runs the service function with the final context from INT. 

INT provides CM with the final context, e.g. function name and parameters to 
trigger specific SP. The final context is generated by mapping the context condition to 
the parameters based on the current state of ubiService. SP is a set of functions to be 
triggered as service of ubiService. Each function is associated with a context 
condition in the Hash table and requires parameters to work. Fig. 5 shows context 
flow among Cl, CM, INT, and SP. 

3.3 Networking 

The ubiSensor is connected to a network that provides a lookup service maintaining 
attributes of ubiSensors such as state of connection with ubiService, a sort of 
preliminary context, etc. The ubiService requests ubiSensors to the lookup service 
with the needed attributes, and the lookup service returns information of ubiSensor 
that can provide a preliminary context satisfying the attributes. After receiving 
information, ubiService directly connects to ubiSensor based on the information. The 
connection between ubiSensor and ubiService is implemented with middleware such 
as JINI [14]. Each ubiSensor notifies its own state of connection to the lookup service 
whenever a change occurs. 
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Integrated Context 



Fig. 3. Integrated Context Processing 



Search 



Integrated Context L 



Hash Table 



Context Condition #1 


Information for Service Module #1 


Context Condition #2 


Information for Service Module #2 






Context Condition #n 


Information for Service Module #n 



Searched 



Integrated Context == 
Context Condition #i 



Information for Service Module #i 






Fig. 4. Searching Context Condition in Hash Table 



4 Application: Context- Aware Movie Player 

We applied the proposed ubi-UCAM to ‘ubiHome’, a testbed for ubiComp-enabled 
home applications. In ubiHome, several ubiSensors (e.g. portable memory, IR sensor, 
on/off sensor, 3D camera, etc.) provide the preliminary contexts in the form of 5W1H 
corresponding to user/object identity, location, gesture, time etc. To show the 
usefulness of the proposed model, we developed a ubiService, which is called c-MP 
(Context-aware Movie Player). 
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Preliminary Contexts 



Fig. 5. Context Flow in ubiService Components 

The c-MP provides residents in ubiHome with context-aware services. The c-MP 
provides user-centered services based on the context such as user’s identity (Who), 
user’s location (Where), time (When), gesture (How), object for movie player (What) 
and user’s intention to control movie player (Why). For example, after a resident 
enters a living room with ubiKey, he/she sits down on a sofa in front of the TV. Then, 
a ubiService menu automatically appears on the monitor. If the resident selects movie 
player from the menu, the c-MP displays a list of movie titles with user-wise history, 
as shown in Fig. 6. When the resident rises from his/her sofa, c-MP automatically 
pauses the movie. If he/she comes back and sits down within 30 seconds from the 
kitchen for snacks or beverages, c-MP resumes the movie. While, he/she does not 
come back in 30 seconds or goes out of ubiHome, c-MP saves the paused status and 
time and automatically stops. The resident can control the movie player by his/her 
gestures as well as by remote controller [15] . For example, he/she can increase 
volumes by raising a right hand up and decrease volumes by putting it down. He/she 
can enlarge screen size by raising a left hand up and lessen screen size by putting it 
down. 




Us«r IdsnUty 



Movie Title 



State 



Fig. 6. Example of Context-aware Service 
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The c-MP gets the preliminary contexts from several ubiSensors such as ubiKey 
[16], ubiFloor [17], CoachSensor [16] and SpaceSensor[15], as shown in Fig. 7. The 
ubiKey using portable memory as sensor module generates the identity of the user 
(Who), location (Where), entering/exiting information (How) and entering/exiting 
time (When). The ubiFloor, where an on/off sensor is attached per 2cm * 5cm space, 
yields relative position to TV (How) and time (When). The CoachSensor, where three 
on/off sensors are embedded in coach, determines the pose of the body, standing 
up/sitting down (How), and time (When). The SpaceSensor using a 3D camera 
analyzes hand/body gestures (How) and time (When). 

As shown in Table 2, each ubiSensor generates the preliminary context based on 
context library of ubiHome. For example, when the user, S.Jang (a resident of 
ubiHome) enters the living room, the ubiKey makes a context message such as 
“sJang\t-\t LivingRoom\t200301271940\tEnter\t-“. When he sits down on a sofa in 
front of the TV, the CoachSensor generates a context message such as “-\t- 
\tCoach\t200301271942\t SitDown\t-“. If he stands up on the ubiFloor and moves 
toward the TV, ubiFloor generates a context message such as “sJang\tTV\t- 
\t200301271944\Comming\t-“. If he raises his right hand, the SpaceSensor generates 
a context message such as “-\t-\t-\t200301271800\t RightHandUp\t-“. Finally, all 
context messages are delivered to the c-MP. 



Table 2. Example of Context Library for ubiHome 



Preliminay Context 


Definition 


Who 


Name of resident in ubiHome 

i.g. wWoo, sjang, yOh, sLee, dHong, sKim, yLee, ySuh, sOh, 
mLee, sjOh, wLee, shLee, kKim, smjung 


What 


Service Object in ubiHome 

- real object : Light, TV, MoviePlayer, AV Player, Movie Title 

- virtual object: Volume, Speed, Size, Luminosity 


Where 


Location information of ubiHome 
- LivingRoom, Kitchen, BedRoom 


When 


Time (YearDateHourMinute) 
- 200301271900 


How 


User gestures which are emuerated in a predefined form for 
ubiHome 

- Enter, Exit, SitDown, StandUp, Coming, Going, G(Select), 
G(Play), G(Stop), G(Pause), G(FastFoward), G(VolumeUp), 
G(VolumeDown), G(SizeUp), G(SizeDown), G(TumOn), 
G(TurnOff), G(Bright), G(Dark) 


Why 


User intenstion and emotion 

- Intention: to-Play, Select, Stop, Pause, Increase, Decrease, Select, 

TumOn, TurnOff, 

- Emotion: Happy, Angry, Sleepy, Active 



The c-MP consists of Cl, CM, INT and SR The Cl, as shown in Fig. 8, gathers 
preliminary contexts every 0.5 seconds. Then it fills an integrated context with 4W1H 
determined by voting and an empty ‘Why’. The remaining element ‘Why’ can be 
determined by lookup table or Neural Networks. The CM searches a context condition 
in the Hash table to find a matched integrated context. If matching occurs, CM 
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triggers a service function of c-MP with the final context. The INT translates the 
resulting integrated context into the final context in the form of parameters 
considering the current state of ubiService. 




(who what where when how why) 




ubiKey ubiFloor CoachSensor SpaceSensor 



Fig. 7. Example of ubiSensor and ubiService 




Fig. 8. Context Flow of c-MP 
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Fig. 9 shows the relationship between the states of c-MP and the integrated contexts. 
The SP supports control functions such as Play(), Stop(), Select(), Size(), Volume(), 
Pause(), and FF() according to the context condition. 



State: Start 

Ubikey: (who - - when Enter -) 
Coach: (- - when where SitDown -] 
Next State: Start 

(who - where when SitDown toStart; 



State: Play 

SpaceSensor:(- - - when G(sup)/ 
G(sdw)/G(sz) -) 

PDA/Mouse:(-size - when selected -) 




Next State: Play 

(who size where when G(sup 

toincrease) 

(who size where when G(sdw 
toDecrease) 

(who size where when G(sz) 
toNormalplay) 


State: Play 

SpaceSensor:(- - - when G{vup)/ 
G(vdw) -) 

PDA/Mouse:(- volume- when selected 


Next State: Play 
(who volume * when G(vup) 




toincrease) 

(who volume * when G(vdw) 
toDecrease) 



, - ' State: Select 
I PDA/Mouse:(who title - - selected 

' I SpaceSensor:(- - - - G(p) -) 

J_ PDA/Mouse:(- play - - select -) 

Next State: Play 

Select ■ when G(p) toPlay) 



State: Stop 

ubiKey:(who - -when exit -) 



Next State: Start 

( toStart) 



State: FastForwd 
SpaceSensor:(- - - when G(p) -) 
PDA/Mouse:(- play- when selected -) 
State: Play 

(who title * when G(p) toNormalplay) 




State: Play 

SpaceSensor:(- - - when G(h2) -)/G(fm) 
PDA/Mouse:(-FF - when selected -) 
Next State: FastFowrd 



(who speed ' 
(who speed ' 



when G(tf2) toincrese) 
when G(tf'/^) toincrease) 



State: Play 

SpaceSensor:(- - - when G(s) -) 
ubiKey: (who - - when Exit -) 
PDA/Mouse:(- stop- when selected -) 
Next State: Stop 
(who title * when G(s) toStop) 



State: Stop, Pause 
SpaceSensor:(- - - when G(p) -) 
Coach: (- - - when SitDown -) 
PDA/Mouse:(- play- when select - 
Next State: Play 
I (who title - when G(p) toReplay) 




State: Pause 

ubiKey: (who - - when Exit -) 
Next State: Stop 
(who title * when G(s) toStop) 



State: Play 

SpaceSensor:{- - - when G{pa) -) 
Coach: (- - - when StandUp -) 
PDA/Mouse:(-pause - - selected • 



Next State: Pause 

(who title ' when G(pa) toPause) 



3 is a function to decide user gesture, G(x) is Ho w. 
G(p}->G(Play), G(s)->G(Stop), G(pa)->G(Pause) 
G(sup)->G(SizeUp), G(sdw)->G(SizeDown), G(sz)->G(Siz^) 
G(vup)->G(VolumeUp), G(vdown)->G(VolumeDown} 
G(ff2)->G(FastForward} 



Fig. 9. States & Integrated Contexts of c-MP 



5 Experiments 

To show the usefulness of the proposed ubi-UCAM, we applied it to a context-aware 
application, c-MP. And we compared it with a noncontext-aware application, 
WinAmp, a normal movie player with new skin (java-juke) [18]. Fourteen volunteers 
(the fifties: 2 persons, the forties: 1 person, the thirties: 3 persons, the twenties: 6 
persons, the teens: 2 persons) tested both applications and reported the convenience 
and satisfaction. 

With an assumption that a user was in ubiFIome to watch a movie, we measured the 
time and the number of explicit commands required to start a movie on TV, waiting 
time per explicit command, and CPU usage of the computer (CPU: Pentiumlll 
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800MHz, Memory: 256MB, OS:WindowsXP). The results are shown in Table 3. As 
shown in Table 3, when a user watches a movie with Winamp, he/she must click 
serveral times to select a movie. As a result, it requires much attention and long time 
duration. While with c-MP, he/she needs only two or three explicit commands to 
select it, since the c-MP automatically provides a user-centered list of movies 
according to his/her preferences and previous activities. Therefore, c-MP requires 
relatively less attention and shorter time duration than those of Winamp. The main 
tradeoffs are waiting time and CPU usage because the c-MP requires processing to get 
a proper context. 



Table 3. Quantatitive Factors 





WinAmp 


c-MP 


Time duration 


20-35 sec 


8-12sec 


# of Explicit Command 


5-12 


2-3 


Waiting Time per Explicit Command 


100-350ms 


500- 1200ms 


CPU Usage 


10-15% 


30-40% 



We have also analyzed the degree of complexity in learning and usage of each 
player, and the results are summarized in Table 4. As shown in Table 4, most of the 
participants had no problem in learning how to contol WinAmp with new skin 
because they were familiar with WinAmp, while they spent some time to learn the 
instruction of the c-MP, gesture-based commands. Note, however, that after getting 
familar with the c-MP, they quickly adapted to the new interfaces. Additionally, most 
of them were satisfied with the personalized movie-playing list that showed the status 
of the movie (to be watched, to be paused, not to be watched) with time information. 
Especially, the fifties were positive about controlling movie player by their gesture 
because they could give attention to a movie without an annoying remote controller. 
The teens were interested in the auto-play/pause/stop functions because they often 
movied around ubiHome. 



Table 4. Qualitative Factors 





WinAmp 


c-MP 


Learning Complexity 


Easy 


Easy 


Learning Time 


1 minute 


2-3 minutes 


Usage Complexity 


Normal 


Easy 


Satisfaction 


60% 


80% 



6 Discussion 

In this paper, we proposed the ubi-UCAM, a unified context-aware application model 
in the ubiquitous computing environment and applied it to an ubiComp-enabled home 
application. The proposed model introduces a unified context in forms of 5W1H that 
can be shared by various applications and specifies the role of context (i.e., 
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preliminary context, integrated context, final context). According to the experimental 
results, the proposed model affirms the independence between sensor and application 
by using a unified context in the form of 5W1H, and exploits relatively accurate 
context to trigger personalized servieces. However, we must expand on the expression 
of context because it is difficult to represent the complex context for all application 
through only context in the form of 5W1H. 
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Abstract. To account for a type of contextual effect on word order, 
some researchers propose theme-first (old things first) principles. How- 
ever, their universality has been questioned due to the existence of coun- 
terexamples and the possibility of arguably rheme-first (new things hrst) 
languages. Capturing the contextual effects on theme-rheme ordering (in- 
formation structure) in terms of information theory, this paper argues 
that word order is affected by the distribution of informativeness, an idea 
also consistent with counterexamples and rheme- first languages. 



1 Introduction 

Various contextual effects on word order have been a topic of active research since 
at least the eighteenth century [1] . Many have noted that old information comes 
before new information [2-4] . The old and new components in an utterance are 
often called theme and rheme, respectively, and the theme-rheme organization 
is called information structure} Accordingly, the idea of “old thing first” is also 
called the theme-first principle. 

The theme-first principle seems to be able to account for certain word order 
phenomena, especially in free- order languages such as Czech [7]. Nevertheless, 
the proposal cannot be maintained in the stated form, because there are a num- 
ber of counterexamples, such as the following. Note that bold face represents 
phonological prominence. 

(1) a. Who knows the secret? 

b. [Peter] Rheme [knows it] T/ieme- 

In the response in this example, the sentence-initial position is the rheme with 
new information corresponding to the wh-word in the question. 

Furthermore, Lambrecht points out that a greater problem for the theme- 
first principle is the existence of arguably rheme-first languages [1]. For example, 
Mithun reports data from the Siouan, Caddoan, and Iroquoian languages and 

^ The contrast between theme and rheme is also referred to as the contrast between 
topic and focus, respectively. This paper uses the terms theme and rheme, focusing 
on the essence found in the contrast observed in many studies. Note that we assume 
that information structure is a binary partition at the utterance level [5,6]. 

P. Blackburn et al. (Eds.): CONTEXT 2003, LNAI 2680, pp. 190-203, 2003. 
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argues that these languages have a rheme-first tendency [8] . Similar data in other 
languages have also been reported [9-11]. Although it is not obvious that these 
are indeed rheme- first languages, the data still show a consistent pattern rather 
different from more theme- first languages. 

Now that we cannot maintain the theme-first principle, at least as stated 
earlier, we must question whether something general can still be said about the 
contextual effects on word order in connection to information structure. Coun- 
terexamples in languages like English do not seem to be abundant. In addition, 
the rheme-first languages seem to be limited to a small number of languages. If 
different word order principles apply to different languages in an ad hoc way, it 
would pose a challenge to developing a universal account of language as a human 
cognitive process. Since information structure has been associated with word or- 
der in various forms, e.g., the Prague school [7] and strict theme-rheme ordering 
of Halliday [4], the above observation may undermine the role of information 
structure. 

This paper develops an idea in Vallduvi [12], who cites Dretske [13], regard- 
ing the notion of information (in terms of entropy) and analyzes word order 
from that point of view. In this connection, we also discuss the definition of 
information structure based on information theory. 

The main hypothesis discussed here is that information structure is a means 
to even out the information load carried by the theme and the rheme of an utter- 
ance (referred to as information balance). Then, we can show that the ordering 
of a low-entropy theme followed by a high-entropy rheme is more desirable than 
the other ordering, which is considered the universal principle behind the theme- 
first tendency. However, if the theme is totally predictable (i.e., zero entropy), 
the ordering does not affect the information balance. This situation appears to 
correspond to apparent exceptions to the theme-first principle. 

Word order is a complex phenomenon involving lexical, syntactic, and prag- 
matic constraints [14]. This paper inevitably leaves out certain important as- 
pects, such as word order within a phrase, where morpho-syntax tends to fix 
word order quite rigidly. 

The rest of this paper is organized as follows. Section 2 introduces an analy- 
sis of the theme-first principle based on information theory. Section 3 discusses 
various rheme-first cases and analyzes whether they are accountable within the 
current approach. Section 4 presents an information-theoretic definition of infor- 
mation structure. 



2 Information-Theoretic Analysis of Word Order 

In this section, we discuss the idea of applying information theory to the analysis 
of the theme-first principle using the following short discourse, where the second 
utterance is partitioned into a theme and a rheme. 

(2) i. John has a house. 

a. [The door] [is purple] 
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Compared to the above example, the following alternative appears less nat- 
ural. 

(3) i. John has a house. 

a. [Purple] [the door isjTfteme- 

The difference will be analyzed later in this section. 



2.1 Basic Entropy Computation 

This subsection discusses a way to compute the entropies of a theme and a rheme 
as independent events, using example (2). Immediately after the first utterance 
in the example, the speaker might want to talk about either the roof or the door, 
something related to the house, or even a completely different subject. For each of 
these subjects, there may be a variety of possible predicates, e.g., large, wooden, 
flat, expensive, purple, and so on. Although it is possible to demonstrate the 
computation of entropy for an arbitrarily complicated case, we use the following 
simplified scenario for presentation purposes: two choices for the theme between 
the door and the roof, and five choices for the rheme among yellow, red, orange, 
pink, and purple. 

Roughly speaking, with more choices, the likelihood of choosing a particular 
option is smaller. In other words, the informativeness of a single choice among 
many would be higher than the one from fewer choices. This idea can be for- 
mally represented using the notion of entropy (good introductions include [15, 
16]). Informally, high entropy is associated with high informativeness, low pre- 
dictability, high uncertainty, more surprise, etc. The use of entropy has been 
discussed even in linguistics and philosophy [17-19]. For example, while Cherry 
suggests usefulness [19], Bar-Hillel is more cautious, saying that information is 
different from meaning [17]. Naturally, the focus of this paper is not on meaning, 
but on word order. 

Under a very special case where all the events are equally likely (uniform 
distribution), the entropy of an event is directly related to the number of choices. 
In terms of probability, the chance of hitting a particular choice out of n choices 
is 1/n. Entropy is a measure related to this probability, but it is also adjusted 
logarithmically so that it is additive, in accordance to human sense. For n equally- 
likely outcomes, x \, ..., the entropy is defined as a function Huniform on real 
numbers:^ 

H uniform, {p) = loga n = - loga (1/n) = - loga p . 

For example, under the current scenario for example (2), the entropy of the 
theme with two choices is loga 2 = 1-0, and the entropy of the rheme with five 
choices is loga ^ — 2.322. 

Entropy is a general function that can also be applied to an event X with n 
outcomes [xi, ..., x„] and the corresponding probability distribution [pi,p 2 , ■■■,Pn]- 
Here, Pi is the probability of Xi, i.e., the shorthand for P {X = Xi) or P (xi). Nat- 
urally, we must have ~ ^ particular outcome Xi, the (pointwise) 

^ The use of base 2 is convenient as it enables us to measure entropy in terms of bit. 
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entropy is — log 2 Pi- We now compute the weighted average of the information for 
all the outcomes. That is, we multiply the ith entropy with its own probability, 
Pi, and then add them all (averaging makes sense due to the logarithmic con- 
version). Let us denote the probability distribution in question as p (bold face 
represents a vector, a list of values). Then, the entropy function H is defined as 
follows: 

n 

(4) H{p)^ Pi log2 Pi 

i—1 

For example, if the five choices of the rheme in example (2) have a proba- 
bility distribution r = [0.275,0.15,0.15,0.15,0.275], the entropy H (r) is —(2 x 
0.275 log 2 0.275 + 3 x 0.15 log 2 0.15) cs 2.256. 

2.2 Dependency between Two Events 

Although the entropies for a theme and a rheme were assumed independent in the 
previous subsection, the choice of the latter component would naturally depend 
on the choice of the former. For instance, in example (2), the predicates for the 
roof and those for the door are likely to have different probability distributions. 
In order to analyze the dependency between theme and rheme, this subsection 
introduces some basic ideas about entropies of two events. 

We now consider two events X and Y. Suppose that event X has two pos- 
sibilities, xi and X 2 , and event Y, two possibilities yi and p 2 - Then, the joint 
probability for each combination of Xi and yi can be summarized as follows: 

( 5 ) 



Naturally, the sum of all the probabilities must satisfy: J2^iPiJ ~ 1- 

At this point, we consider extending the definition of entropy (4) to a two- 
event situation, summing over both of the events. For events X and Y with 
m and n possibilities, respectively, we have joint probability pij for Xi and yj. 
Then, the joint entropy of the two events is defined as follows: 

n m 

H {X, T) = - '^'^Pij ^og2 Pij 

j=i i=i 

As an example, let us consider the following joint probability distribution for 
A and Y : 

( 6 ) 



Then, the joint entropy can be computed as follows: 





H{X,Y) = -(O.Hog 2 0.1 + 0. 21og2 0.2 -F0.31og2 0.3 + 0.41og2 0.4) ~ 1.846 . 
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Since the joint probability already contains the complete information about the 
two events, knowing X and Y separately would generally lead to some redun- 
dancy. For example, since H (X) + H (F) cs 0.881 Y 0.971 = 1.852, we see that 
H{X,Y) <H{X) + H{Y). 

We now consider the information measure that corresponds to H {X, Y) — 
H {X). Since Y is conditional to X, it is called the conditional entropy, rep- 
resented as H (Y\X). Analogously, we can also consider H (X|F). Then, the 
following equation relates the information measures discussed so far. 

H (X, Y) = H{X) + H {Y\X) =H{Y)YH (X|F) 

Returning to example (6), we have H (X, Y) = H (X) + H (F|X) cs 0.881 + 
H (Y\X). Thus, we know that H (F|X) is 0.965, which is less than H (F) cs 
0.971. Since conditional information never increases the uncertainty, we have 
the following inequality: (X|F) < (X). 

Another measure is used to indicate the degree of dependence between two 
events, called mutual information, which is defined by the following equation: 
I (X; Y)=H{X) + H (F) - H (X, F). 



2.3 Information Balance 



We now apply the ideas introduced in the previous subsections to our analysis 
of word order. We use example (2) with the following probability distribution 
for the theme and the rheme (ti and t 2 refer to the two theme choices and ri 
refers to one of the five rheme choices). 



( 7 ) 





r\ 


T2 


?’3 








tl 


0.25 


0.125 


0.075 


0.025 


0.025 


0.5 


t2 


0.025 


0.025 


0.075 


0.125 


0.25 


0.5 




0.275 


0.15 


0.15 


0.15 


0.275 





How we can actually come up with such a probability distribution is a difficult 
question. Since some possibilities can be related to the context through inference 
(linguistic and extra-linguistic), it naturally involves the kind of difficulty faced 
in many pragmatic studies. Next, there is a question of whether the probability 
distribution under discussion should be understood only from the speaker’s point 
of view. In addition, the notion of joint entropy involves the connection between 
two events, which also requires analysis. For the present discussion, we assume 
that the probability distributions for the theme and the rheme are available, and 
we will build arguments based on this assumption. 

The entropies for the theme, the rheme, and the entire utterance (indepen- 
dently) are H (T), H{R), and H {T,R), respectively. If the rheme is delivered 
after the theme, we consider the conditional entropy of the rheme after exclud- 
ing the effect of the theme, i.e., H {R\T). Then, H (T, R) = H (T) + H (R\T). 
On the other hand, if the utterance is made in the rheme-theme order, we have 
H (T, R) — H (R) + H (T\R). In the following, as the entropy of the latter com- 
ponent, be it the rheme or the theme, we always use the conditional entropy. 
The basic information measures for example (7) are computed as follows: 
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H{T)^- (0.5 log2 0.5 + 0.5 log^ 0.5) = 1.000 

i? (i?) = - (2 X 0.275 log 2 0.275 + 3 x 0.15 log 2 0.15) 2.256 

H(T,R) = -( 2 X 0.251og20.25 + 2 X 0.125 log 2 0.125 

+2 X 0.75 log 2 0.75 + 4 x 0.025 log 2 0.025) 2.843 

H {R\T) ^H{T,R)-H (T) - 1.843 
H {T\R) ^H{T,R)-H (R) - 0.587 
I (T; R)^H{T) + H (i?) - H (T, R) - 0.413 . 

In order to compare the evenness of the information distribution between 
theme and rheme, we introduce a measure, information balance, defined as fol- 
lows: 

Definition 1. Information balance: The standard deviation of the entropies of 
the theme and the rheme (of an utterance) for a particular ordering. 

Note again that the entropy of the latter component is a conditional entropy. 
With this definition, the main proposition of this paper can be described as 
follows: 

Proposition 1. The information structure with a lower information balance is 
preferred. 

Next, let us compute the information balance of the theme-rheme (rheme- 
theme) ordering, denoted as aTR {ccrt). To do so, we first compute the average 
of the entropies for the theme and the rheme (identical for both orders): Err = 
Ert = H{T,R) /2^ 1.421. 



cttr = \j [\H (T) - Etr\^ + \H {R\T) - Err\^) /2 ^ 0.421 

URT = ^ (\H (R) - Enrf + \H {T\R) - Enrf) /2 ^ 0.835 
Thus, we have (Tt_r < o~rt. 

For both of the word orders, the relevant entropy measures and information 
balances are summarized below. 



(8) a. Theme Rheme Information Balance 
H{T) H{R\T) arR 
1.000 1.843 0.421 



b. Rheme Theme Information Balance 
H{R) H{T\R) <jRT 
2.256 0.587 0.835 

This shows that the theme-rheme order has a more even distribution of entropies 
than the rheme-theme order. That is, it would be easier for the listener to process 
the information in the theme-rheme order. 

Now, we can formulate the principle underlying the theme-first tendency as 
follows: 
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Theorem 1. (Informally) If the entropy of the theme is lower than that of the 
rheme, the theme-rheme ordering is never worse than the other ordering with 
respect to information balance. (Formally) If H {T) < H {R), axR < ccrt- 

The above theorem is interesting on the following two points: (i) it predicts that 
the theme-rheme ordering is preferred, and (ii) it can also specify under what 
condition there is no difference between the two orders. Here is a proof. 

Proof. First, the information balance for the two events X and Y in that ordering 
is computed as follows: 



[\H (X) - EtrI^ + \H (X, Y)-H (X) - Etr^) /2 

= y {\H {X) - EtrI^ + \-H {X) + Etr\') /2 
= \H{X)-Etr\ . 

Let us consider the (independent) entropies for T and R as H (T) and H {R), 
respectively. Since H (T) < H (R), we have arR — \H {T) — Etr\ and a nr — 
\H{R)~ Etr\. Then, applying H (X, Y) ^ H (Y) + H (X\Y) and H (X\Y) < 
H (X), we have the following. 



^TR - ctrt = [Etr - H (T)] - [H {R) - Etr] =H{T,R)-H {R) - H (T) 
^H{T\R) - H{T)<Q 

Therefore, cttr < (Trt- □ 



2.4 Special Cases 

As suggested in the previous subsection, information balance can be the same 
for both the theme-rheme and rheme-theme orders in certain cases. 

First, the theme and the rheme could have exactly the same information (or 
are completely dependent), i.e., H {T,R) = H (T) = H{R). However, this case 
is unlikely in reality. 

Second, if the theme and the rheme are completely independent, i.e., I (X; Y) 
— 0, the joint entropy is the sum of H (T) and H {R), i.e., H (T, R) = H (T) + 
H {R\T) = H (T) + H (R). Thus, the information balance would not depend 
on the theme-rheme ordering. As we noted earlier, it is more likely that the 
theme and the rheme have some informational dependency, and thus this case 
would be atypical. However, there is an important special subcase. If the theme 
is completely predictable, i.e., H {T) — 0, the entire information solely depends 
on H (R) = H (T, R), i.e., axR = c^rt- The information balance is now between 
zero and H (R) regardless of the word order. The situation corresponds to Lam- 
brecht’s statement: if theme (his topic) is established, there is no need for it to 
appear sentence-initially [1]. The symmetrical case where H {R) = 0 is unlikely 
because we can assume that the rheme always has some information. 
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In summary, assuming that the theme has a lower entropy than the rheme, 
the theme-rheme ordering is never worse than the other with respect to infor- 
mation balance. Exceptions to the theme-first principle occur when the theme 
is completely predictable, i.e., H (T) = 0. 

3 Analysis of Rheme-First Cases 

In this section, we examine various rheme- first cases. The first subsection deals 
with exceptions in English, a language that is not considered rheme- first. The 
second subsection deals with examples in an arguably rheme- first language. 

3.1 Exceptions in English 

In example (1), the theme is completely predicable. Thus, its entropy is zero. 
As a result, it falls into the special case discussed in the previous section, where 
the position of the theme does not affect the information balance. Exceptions to 
strict theme-principles like this are still consistent with the present hypothesis. 

There is another point regarding the status of contrastive theme, as in the 
following example.^ 

(9) Q: Well, what about the beans? Who ate them? 

A: [Fred]ij,jeme [ate the beans] 

Here, the word “beans” is stressed because of the potential contrast between 
beans and, say, potatoes. One might question whether the entropy of such a 
theme is zero. But as long as the theme is completely predictable as in the above 
example, its entropy is still zero. Thus, the above example is consistent with our 
analysis. The existence of contrastive elements does not necessarily increase the 
entropy. In this respect, entropy computation is different from analyzing the set 
of alternatives as discussed in Steedman [22]. 

Lambrecht argues that contrastive themes (his topic) must appear sentence- 
initially because they must announce a new topic or mark a topic shift [1]. But 
example (9) is a counterexample to this analysis. Unlike Lambrecht, the present 
hypothesis predicts and accepts the existence of a contrastive theme after the 
rheme as long as it has zero entropy. 

In written texts in English, it is generally more difficult to find a rheme-first 
pattern. Here is an attempt to create a text comparable to example (9). 

(10) i. Once upon a time, the villagers planted beans and potatoes. One day, 
they noticed that someone ate the beans. Someone must have ate them. 
a. Fred ate the beans. 

Hi. Fred was a monk who ... 

Although utterance (lOi) provides basically the same information as question 
(9Q), utterance (lOii), which is the same as (9A), sounds less natural in this 

^ Predicates like “eat” imply the existence of a (possibly deleted) event argument 
[20], which may affect the information-theoretic analysis [21]. This situation can be 
avoided by using another type of verb, such as “know.” 
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text. An alternative, “the one who ate beans was Fred,” sounds more natural. 
This suggests that the entropy of “ate the beans” is not zero. Unlike the context 
generated by a question, utterances in a written text tend to leave a variety 
of options after them. This seems to explain why the theme- first tendency is 
observed more commonly in the written form of English. 

The present hypothesis predicts the following: it is preferable for an unpre- 
dictable theme to precede a rheme. However, it is always possible to violate 
such a preference. As an example, consider the following abstract taken from a 
medical journal (utterances are numbered for reference purposes). 

Title: *^Overuse Injuries in Children and Adolescents 

^The benefits of regular exercise are not limited to adults. ^Youth 
athletic programs provide opportunities to improve self-esteem, ac- 
quire leadership skills and self-discipline, and develop general fitness 
and motor skills. ^Peer socialization is another important, though 
sometimes overlooked, benefit. ^Participation, however, is not with- 
out injury risk. ^While acute trauma and rare catastrophic injuries 
draw much attention, overuse injuries are increasingly common. 

In utterance 3, between the phrases (A) “peer socialization” and (B) “another 
important, though sometimes overlooked, benefit,” phrase (B) seems to connect 
to the context more strongly due to the word “benefit,” which already appeared 
in utterance 1. While the choice of “benefit” is among other contextually linked 
alternatives, the choice of “peer socialization” is among more diverse possibilities. 
Then, the entropy of phrase (B) must be lower than that of phrase (A). If the 
phrases are reversed as in “Another important, though sometimes overlooked, 
benefit is peer socialization,” the information balance of this utterance would be 
lower and more appropriate than the original utterance 3 in this context. 

3.2 Rheme- First Languages 

Although some have claimed certain languages to be rheme- first, we need to be 
careful about identifying rheme-first patterns. First, depending on the way it is 
defined, typological classification of verb-initial language may simply mean that 
the pattern occurs more frequently than others. Second, being verb-initial does 
not automatically mean that the language is full of rheme-first patterns [23] . 

The discussion below focuses on Iroquoian data taken from Mithun [ 8 ], which 
seems to represent the most prominently rheme-first case {newsworthiness-first, 
to use her term). The utterances are taken from Tuscarora stories. The back- 
ground is as follows: the speaker first describes a long journey on the ice, discov- 
ery of land, and preparation for a sacrifice (some phonetic symbols have been 
replaced for font availability reasons: “ 9 ” for right-hooked schwa and “f” for 
glottal stop). 

( 11 ) i. [ha? uhq,?nq? ruinqiqh] Rheme, wahrqhrg?, ... 
the head man he said 

“the headman said, ...” 

: (after the sacrifice is made) 




Contextual Effects on Word Order 199 



ii. §:waeh tihruyqhwfqh haem:kg,: uhqfnq? ru?ng,?gh? 

where he has learned from that head man 
“Where had he learned it, that headman?” 

: (the speaker begins his recipe for cornbread) 

iii. Tyahraetsihg kg:9 [uhsaeharceh] Rheme ■■■ wafkkuhae? . 

first customarily ash I went after 

“First, I usually would go after ashes.” 

: (after a kettle is prepared and is boiling) 

iv. U:ng kg:9 [yahwa?kkg?nae:ti?] Rheme hdjthu hafuhsaeharaeh. 

then customarily there I poured there the ash 

“Then I would pour the ashes in there.” 

We exclude utterance (ii) from discussion because analysis of the information 
structure of a question is beyond the scope of this paper. First, (Hi) and (iv) 
include an adverbial at the beginning of the utterance. Thus, they do not have 
rheme- first patterns in a strict sense. On the other hand, the last constituent is a 
part of the theme in each utterance. Thus, we see some type of rheme-theme pat- 
tern consistently, which is strikingly different from more theme- first languages. 
The constituents after the rhemes are either a pronoun, a definite expression, or 
a fairly light verb. That is, these constituents are highly predictable and their 
entropies are very low, if not zero. 

Let us examine other utterances from the same story. The following is an 
introductory sentence to begin a war story. 

(12) Umgha? kyaenvkg: tikahd:wi? kyaenvkg: [kayglrvyus 

long ago this so it carries this they fight 

kyaenvkg: wahstghd:ka:?, tisng? kurdhku:] Rheme ■ 

this Bostonians and British 

“One time long ago the Americans and the British were at war.” 

This is in fact a theme-rheme pattern. The theme is a typical element used to 
begin a story. The verb-subject order within the rheme is beyond the scope of 
the present analysis. 

In the following, a peddler had been driving a horse, although the horse itself 
is not mentioned. Mithun argues for the newsworthiness of the verb. 

(13) U:ng haesng: [9 ahra?nu:ri?\Rheme ha?d:ha:9. 
now then again he drove the horse 
“Now then he drove his horse again.” 

Again, this is not strictly rheme- first, and the constituent “the horse” is pre- 
dictable from the context. 

Mithun does not discuss the context for the following, but says that the focal 
point is “behind her.” 

(14) [aeltaehsnakw] Rheme wahra?nd?nihr. 

behind her he stood 

“He stood behind her.” 
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The information “he stood” must be predictable. In the following, although “in 
front” is probably not completely predictable, it seems to have a low entropy 
readily inferrable from “behind.” 

(15) [Yu:?naeks][iheme uhg,?ng,?. 
it burns in front 

“A fire was burning before her.” 

Mithun cites the literature and observes that in spoken language, significant 
new ideas are introduced one at a time [8]. For the above example, we could 
even say that the story can continue by linking the rhemes (and the themes 
preceding the rhemes), but omitting the constituents after the rhemes. Thus, 
in these rheme-theme patterns, we can still see zero-entropy themes after the 
rhemes. This observation is consistent with the present hypothesis. 

Why there are (more or less) rheme-first languages and why there are also 
so few are intriguing questions. As a cognitive motivation for the rheme- first 
pattern. Downing refers to primacy effect [24]. In addition, Mithun adds that 
the sentence-initial position has an advantage of being more prominent prosod- 
ically because of downstepping (gradually decreasing pitch) [8]. However, since 
even Iroquoian allows sentence-initial adverbials as a part of the theme, nei- 
ther of these proposals seems convincing. Finally, Mithun points out that the 
arguably rheme-first languages are highly agglutinating with a small number of 
constituents in each utterance and that the development of affixes may have 
affected the different degree of rheme-first tendency in the Siouan, Caddoan, 
and Iroquoian languages [8]. Additional relevant data can also be found in the 
literature [25-28] , which are left for future work. 

4 On the Definition of Information Structure 

In this section, we turn our attention to the definition of information structure. 
Although researchers have some general agreement about the notion of informa- 
tion structure, the precise definition is still a matter of controversy. This section 
adds yet another definition, because it is rather different from the previous ones 
and could provide a precise foundation for its predecessors. 

4.1 Previous Definitions 

The most common way of analyzing information structure is to use a question 
test, as already seen in example (1). We could even define information structure 
based on a question test. However, such a definition cannot be applied to analyze 
information structure in texts. Another popular definition by Halliday [4] is 
problematic, because it is limited to the theme-rheme order. 

Lambrecht provides a more general definition as shown below [1]. 

That component of sentence grammar in which propositions as con- 
ceptual representations of states of affairs are paired with lexicogram- 
matical structures in accordance with the mental states of interlocu- 
tors who use and interpret these structures as units of information 
in given discourse contexts. 
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This definition appears intuitive, but it still does not nail down the concept in a 
precise manner. In particular, its reference to mental states seems to leave room 
for further specification. 

Although the referential status of the rheme can vary, there are certain re- 
strictions on the referential status of the theme. Themes are in general evoked 
or inferrable in the sense of Prince [29]. However, it is extremely difficult to 
pinpoint to what extent we can actually infer a theme from the context. Any 
definition of information structure based on the referential status of the theme 
would face this problem. 



4.2 Information-Theoretic Definition 

One of our assumptions is that the theme has lower entropy than the rheme. In 
this section, we attempt to define information structure based on this idea. Here 
is our definition:^ 

Definition 2. The information structure of an utterance is the linguistic real- 
ization of a binary partition (composition) of the semantic representation of the 
utterance between theme and rheme, such that the entropy of the rheme is greater 
than that of the theme. 

Let us examine some of the prominent features of this definition. First, it 
assumes a binary partition. We also assume that partitions are those gram- 
matically feasible ones. For example, such a partition can be represented using 
Combinatory Categorial Grammar as discussed in Steedman [22]. 

Definition 2 refers to the entropies of the theme and the rheme only relatively 
and does not directly refer to absolute properties of the theme or the rheme. As 
mentioned in Section 2, the computation of entropy would eventually depend on 
the analysis of inference. Thus, various problems of dealing with inference will 
not go away. However, it seems advantageous to abstract away from the difficulty 
with inference, as we can leave it all in the computation of entropy. 

Except for the binary partition requirement. Definition 2 does not refer to 
linguistic notions such as reference to a verb and argument-adjunct distinction 
(cf. [7, 1]). As a result, the definition can be applied robustly to any construction 
in any language. 

Since Definition 2 is based on entropy that evaluates to a numeric value, it 
can be compared with our own occasionally grayish judgment about information 
structure. In many cases, it is difficult to analyze information structure, espe- 
cially in a written text. A theory of information structure may actually need to 
fail gracefully if the situation is not clear-cut. Unlike previous definitions, the 
present approach accepts such a possibility. Furthermore, the use of probabil- 
ity distribution would still allow us to assign small probabilities to unexpected 
outcomes. This can be adopted to account for unexpected options and indirect 
responses to a question. 

This definition is not compatible with recursive analyses of information structure 
including [4]. More details on this point are available in [6]. 
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5 Conclusion 

This paper proposes a hypothesis that information structure is to even out the 
information load of the theme and the rheme (information balance) . Assuming 
that the theme is the low-entropy component of an information structure, we 
show that placing the theme before the rheme is, in this respect, never worse 
than the other order. A natural consequence is the theme-first tendency. One 
interpretation is that information structure is a way to minimize the required 
channel capacity. 

The rheme- first examples are analyzed as involving zero-entropy themes. 
Since the information balance is not affected by the position of such themes, 
these examples are still consistent with our proposal. The paper also discusses a 
new definition of information structure as informational contrast between theme 
and rheme, which can serve as the basis for the entire discussion of this paper. 

The current proposal is to some extent consistent with many other propos- 
als about the relation between word order and information structure. However, 
the proposal is novel in that it relates certain word-order phenomena directly 
with the notion of entropy, which is widely applied to various fields, including 
linguistics. This approach also introduces a possibility of applying psycholin- 
guistic/cognitive techniques for further evaluation. The proposal is arguably the 
first to derive both theme-first tendency and seemingly exceptional cases from a 
single hypothesis. This is desirable as we can now view more diverse phenomena 
with fewer principles. 
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Abstract. In conventional security systems, protected resources such as 
documents, hardware devices and software applications follow an On/Off 
access policy. On, allows to grant access and off for denying access. This 
access policy is principally based on the user’s Identity and is static 
over time. As applications become more pervasive, security policies must 
become more flexible in order to respond to these highly dynamic com- 
puting environments. That is, security infrastructures will need to be 
sensitive to context. In order to meet these requirements, we propose a 
conceptual model for context-based authorizations tuning. This model 
offers a fine-grained control over access on protected resources, based on 
a set of user’s and environment state and information. 



1 Introduction 

Research in the security field covers many aspects such as the improvement of 
cryptographic algorithms in order to make them more resistant to hackers, im- 
plementation of new authentication methods and designing access control mech- 
anisms, etc. In traditional security systems, the security policy is pre-configured 
to a static behavior and cannot be seamlessly adapted dynamically to new con- 
straints. This situation is due to the lack of consideration for the context in 
existing security systems. As a consequence, there is a lack of clearly defined 
conceptual models of context and system software architectures. 

The goal of our research is to develop a conceptual framework for context- 
based security systems. Context-based security aims at adapting the security 
policy depending on a set of relevant information collected from the dynamic 
environment. As the environment evolves, the context change, some contextual 
elements being integrated in the proceduralized context, others leaving the pro- 
ceduralized context jp. Thus, security policies dynamically change in order to 
cope with new requirements. 
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Our model is intended to handle all the components of a security system in- 
cluding authentication, privacy and authorization and may be easily extensible 
to include new security modules. However, this article discusses essentially re- 
sources access control or authorizations in the case of distributed systems, where 
a set of independent computers and devices communicate via a network in order 
to share data and services. 

The structure of the paper is as follows. Section |2l discusses security issues 
in ubiquitous applications. The main contributions of the paper are presented 
in Section E] The foundations for designing context-based authorizations frame- 
works are laid in Section Q] Section El concludes this paper with an outline of 
future research directions. 

2 Security Issues in Ubiquitous Applications 

The use of widely distributed resources provides a huge potential for expanding 
the way that people and businesses communicate and share data, provide services 
to clients and process information to increase their efficiency. This broad access 
has also brought with it new security vulnerabilities. Security systems developed 
now suppose a given and static framework, when attacks generally try to bypass 
these static contexts of effectiveness of security systems. Amazingly, security 
has often been the last requirement in designing such dynamic environments. 
This situation is due to the high cost of security infrastructures, export controls 
of cryptography technologies and the lack of experts in the security field for 
specific applications [Q. This is particularly true inside corporate networks where 
a firewall is assumed to keep all hackers out p|. Firewalls are, however, not 
sufficient to protect shared resources. The main function of a firewall is to block 
unwanted traffic and hide vulnerable internal-network systems. It provides no 
data integrity and does not check traffic not sent through it, which means that 
it cannot protect the corporate network from internal attacks. People inside the 
network may maliciously or unintentionally reveal critical data to unauthorized 
users or disturb the well- working of the system. As a conclusion, firewalls should 
always be viewed as a supplement to a strong security policy. 

According to Merriam- Webster a policy is ”a definite course or method of 
action selected from among alternatives and in light of given conditions to guide 
and determine present and future decisions. ” 

In the same spirit, we define a security policy as ”a set of rules that monitor 
all the security components behaviors acting on the framework to secure. ” The 
security policy must be concise, descriptive and easily implement able. Security 
components consist of access control lists, cryptographic algorithms, and users 
authentication tools. They act over the following security levels: network and 
application levels. 

2.1 The Network Level 

Networks are all about the sharing of data and applications. In recent years, 
network security breeches have increased in occurrences and more importantly, in 
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severity. Security in these environments is thus a perquisite. Actually, there is no 
single technique to ensure reliable networks but different technologies (firewalls, 
encryption, etc. . . ) are combined together in order to face security attacks. They 
try to extend security frameworks by combining them, but staying in a static 
approach. The next generation of Internet protocol (IPv6) is intended to add 
new security features at the network level over its predecessor version. 



2.2 The Application Level 

Network-based security suffers from some limitations on the kind of security 
checks that can be performed. The reason is that network-based security sys- 
tems do not operate on a high level of data abstraction and cannot interpret 
the content of the traffic. They only know about hosts, addresses, and network 
related concepts. Application-based security is in contrast intended to provide a 
security layer based on user roles and identity along with other high level con- 
cepts such as protected resources and access policies. Our context-based security 
model is intended to operate at the application level. This does not mean that 
there is no need for network-level security. The main reason is simply that it is 
much cheaper to reconfigure the security infrastructure at the application level 
than at low-level (network). 

Other requirements must be taken into account regarding the security policy. 
Following the definition given above, the security policy is intended to manage 
all the security components of the distributed system. Namely, the authenti- 
cation, authorization, integrity and confidentiality modules and must be easily 
extensible to manage newly integrated modules. Following the aim of our work, 
the security policy must also be reconfigurable depending on the user and ap- 
plication environment context. This leads to the definition of a context-based 
security policy. Due to the pervasive nature of recent distributed environments, 
an additional requirement is the definition of shared policies. These features 
will be detailed in the following section. 

3 Research Aim and Scope 

Works addressing security issues in pervasive computing, basically provide tech- 
nical solutions such as authentication, access control, integrity and confiden- 
tiality, and the security models are generally static. That means that they are 
built according to already identified threats. The resulting infrastructures are 
thus, very difficult to adapt to new threats. This work, rather, focuses towards 
a new aspect of security. We believe that more secure systems can be achieved 
by adding to these systems the ability to automatically adapt their security pol- 
icy depending on new constraints. These constraints are dictated by the user’s 
and application environment. Figure 1 illustrates this idea. The distributed ap- 
plication is controlled with an initial security policy in an initial context. This 
context is continually changing in request to triggers (dynamic changes in the 
environment). The security policy must then adapt itself to the new context. 
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Our approach will thus combine the two fields of context-aware computing and 
security in pervasive computing in order to provide the foundations for ’’context- 
based security” . 



3.1 Related Work 

Integrating security with context-aware environments is a recent research direc- 
tion. Most of the efforts are directed to securing context-aware applications. In p|j 
and Covington et al. explore new access control models and security policies 
to secure both information and resources in an intelligent home-environment. 
Their framework makes use of environment roles P| . In the same direction, Ma- 
sone designed and implemented RDL (Role-Definition Language) , a simple pro- 
gramming language to describe roles in terms of context information (3j. There 
have also been similar initiatives in p| and p]. 

Interestingly, we observed that all previous work on combining security and 
context-aware computing follow the same pattern: using contextual information 
to enrich the access control model in order to secure context-aware applications 
with a focus on specific applications. 



security context 




3.2 Contributions of This Paper 

By comparison with previous contributions discussed in Section |H1 our work is 
about contextualizing security rather than securing context-aware applications. 
Even if the difference is not completely apparent actually where we begin by 
describing the overall architecture, fundamental differences will emerge in future 
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contributions when detailing each component. As a preliminary example, we cite 
the new concept of ’’security context” introduced in Section^3 
We summarize in the following the main contributions of our research: 

1. To lay down the minimal foundations for a generic context-based security 
framework with a focus on the software architecture. Generic means to 
provide the minimal software architecture that can be easily extended to 
build more specific applications. In addition, context-related modules and 
resources are loosely coupled allowing the adding/removing of new resources 
and to modify their respective access policies in a transparent manner. This 
is an appropriate choice for highly dynamic environments. 

2. The second main contribution is to provide a way for managing federations 
of resources following a specific global policy (an organization’s policy for 
instance) where resources and services join and leave the federation in an ad 
hoc manner. That is why access policies are organized by resource type and 
their corresponding actions (see the example in Section \AJ1) . 

3. The third main contribution is to require specific authentication methods 
depending on a partial context built from the state of the federation. 

The resulting prototype will then be designed with the following requirements 
in mind: 

— Provides a framework for the rapid prototyping of context-based security 
systems, 

— Handles both simple and high-level contextual information related to secu- 
rity, 

— Easily extensible to manage new protected resources, 

— Easily reconfigurable to adapt to new access policies, 

— Allows a customizable (context-based) method of authentication (user- 
name/password, certificates, etc) by requiring specific credentials depending 
on a partial context. 

— Transparent to both resources and requesting clients; no need to an a priori 
knowledge of the federation policy. 

The following section describes the overall architecture. 

4 Context-Based Authorizations Tuning 

The term context-aware computing was first introduced by Schilit et al in 1994 
[E] as a software that ” adapts according to its location of use, the collection of 
nearby people and objects, as well as the changes to those objects over time^’ . 

Another given by Dey in m states that ”A system is context-aware if it uses 
context to provide relevant information and/or services to the user, relevancy 
depends on the user’s task.” 

Now context awareness is a well established community with conferences as 
ubiquitous computing, pervasive computing, etc. 
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In the field of context-aware computing, context is generally defined as phys- 
ical parameters (location, temperature, time, etc.) obtained from sensors. How- 
ever, the user is not really considered in these approaches. In this sense, context 
is generally managed as a layer between an application and the external world, a 
type of middleware. Conversely, there is also another approach in which the user 
(through his knowledge and reasoning) is central in the modelling of context. In 
this second area, knowledge and reasoning in the accomplishment of a task are 
described in a context-based formalism, i.e. inside the application itself (e.g. see 
the contextual graphs in [EJ). 

Our research covers both the first and second work with an application to the 
security infrastructure of ubiquitous applications. We concentrate on the main 
part of the security framework of a distributed system. Namely, access control 
to shared resources. The framework makes use of RBAC (Role-Based Access 
Control). Users are affected to roles based on their credentials and competencies 
[OJ. Role-based access is more suitable for pervasive environments since it sim- 
plifies the administration of permissions; updating roles is easier then updating 
permissions for every user individually [ iirp |E]. 

4.1 A Case Study 

To illustrate the main functionalities of the proposed architecture, we consider 
a simple example. This example will be developed along with the definition of 
each component. 

We consider a protected document that offers the following operations: read, 
write and delete. Depending on their credentials and identity, requesting users are 
a priori affected to one of these two roles: administrator or guest. The document 
is available on the network and its access policy is defined and stored in the 
context engine. In our model, resources are managed by a specific access policy 
depending on their type; the type of service they provide. 

The actual access policy is defined using a rule-based formalism with a sim- 
plified grammar (no explicit If/Then clauses). Rule-based reasoning is an area 
of artificial intelligence (AI) wherein people simulate human behaviors when 
presented with a new case requiring some action. This approach is used here to 
specify context-based access policies in order to grant or deny access to resources 
(see [[[Rj and [EJ for more information). 

We consider that all protected resources are protected by default, thus, their 
corresponding policies express only cases when the access is granted (which jus- 
tified the lack of If/Then clauses). This design choice aims at lightening the 
process of policies specification. Here is an example: 

A Simple Access Policy 

Resource_type = document ; 

Action = read ((Role = administrator) OR 

(Role = guest; (Date = Weekdays AND Time = between 8:00-18:00))); 
Authentication = username/password; 

Action = write (Role = administrator) ; 



210 



G. Kouadri Mostefaoui and P. Brezillon 



Authentication = username/password; 

Action = delete (Role = administrator; Date = Weekends); 

Authentication = certificates; 

Each shared resource defines access rules for each individual operation. The 
authentication tag is used to specify the authentication method required in the 
actual partial context. The access is granted only when the complete context 
is build; if the conditions are satisfied and the corresponding authentication 
phase succeeded. The pattern used above eases further updates of the access 
policy by adding or removing conditions on it. Defining access policies manually 
is a cumbersome task in complex real applications with complex relationships 
among roles. This process can, however, be performed visually using a graphical 
interface. In [^, Covington et al. propose a graphical policy editor for specifying 
available roles, their relationships, and policy rules. 

Based on the above example, we present the main parts of the security ar- 
chitecture. 



4.2 Protected Resource 

We consider three types of resources: hardware devices, physical resources (doc- 
uments, databases) and software resources (operations on a software object or 
data structure) . In order to fit within our model, each resource must respect the 
following structure (see Fig. 







protected resource 
protected resource 
protected resource 




'N 




Fig. 2. Structure of protected resources 



— Any interaction with the resource is performed via an interface that presents 
the set of all actions available for the resource. 
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— Resource actions contain an additional flag that accept only two states: true 
or false. This flag is used to allow or deny the operation on the resource, 

— Each resource is protected by default (all of its corresponding actions have 
a flag value equal to false). 

A user whose role is guest invokes the read action on the document via the 
interface attached to it. The protected resource identifies the request as coming 
from a client. It then, forwards it to the context engine. 

4.3 The Security Context Engine 

The context engine has two responsibilities: modelling contextual entries in or- 
der to build a security context and mapping between the security context and 
the corresponding authorizations on resources. Modelling context requires pick- 
ing out the most relevant features to reduce it to a meaningful representation 
[0?,[Ca|. We provide herein two types of classifications of contextual entries de- 
pending on their representation aspect and temporal aspect: 

Representation Aspect 

— Simple: The collected information is used in its original format. For exam- 
ple, it can represent the value of a parameter, 

— Interpreted: The collected data cannot be used as it is but needs to be 
converted in a more meaningful format. For example, the contextual entry 
is ’’Sunday” that needs to be converted into ’’Weekday” or ’’Weekend”, 

— Composite: It is a set of simple and/or interpreted entries collected as a 
whole. 



Temporal Aspect 

— Static: It describes contextual information that is invariant, such as a per- 
son’s date of birth, 

— Transient: The value of a transient contextual entry is updated at run-time 
and does not need to store information about its past state. For example, 
time, date, etc, 

— Persistent: Some entries must store historical data about their past state. 
Persistent contextual entries need to be marked with a time stamp. 



Building a Security Context. Our model relies on a set of contextual in- 
formation relevant to security. This set forms what we call a security context. 
Designing context in general is not easy and designing a security context suf- 
fers the same problem. We present herein an attempt definition of the security 
context. 

A security context is a set of information collected from the user’s en- 
vironment and the application environment and that is relevant to the 
security infrastructure of both the user and the application. 
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The word relevant means that has direct or indirect effect on the security policy. 
In the present work, we are dealing with authorizations, so the contextual infor- 
mation is more precisely relevant to authorizations on available resources. What 
is really relevant is not fully predictable in advance and depends on the applica- 
tion. This information may include, user identity, membership, resource location, 
date, time, the user’s interaction history with the system, social situation, etc. 

In our example, the context engine extracts the operation type and the 
client’s role from the received message. It then builds a partial context based 
on the client role and the access policy of the requested resource. To build a 
partial context, the engine retrieves the suggested contextual information from 
the context bucket. Following the policy defined in section requests to the 
bucket will ask for date and time. The resulting partial context requires a specific 
authentication method (a username/password method in our example). Thus, 
credentials provided by the user (username/password, certificates, etc) are addi- 
tional sources of contextual information. Once the complete security context is 
built, final actions are performed; access to the document is granted (if authenti- 
cation succeeded) or denied (if authentication failed) . This process is practically 
equivalent to setting the read operation fiag to true or false. Contextual data are 
received from the security bucket in a primitive type, and then interpreted at 
the context engine level. For example, date is represented as "Monday^' and it is 
the responsibility of the context engine to interpret it as ” Weekday^^ . This design 
requirement eases the reuse of the context bucket by different applications with 
different interpretations of the same contextual data. 



4.4 The Security Context Bucket 

One of the main problems of context is how to store it and in a way that many 
applications can use it. This is true especially in distributed applications where 
both the contextual information and the applications that need it are naturally 
spread and shared ED). In order to store contextual entries, we investigate a 
central point of fall. All the security contextual data are collected into a logical 
bucket; the security context bucket. 

The security context bucket is a shared software data structure that of- 
fers the notion of container in order to handle the security contextual 
information. 

A similar approach has been proposed in m- 

At first sight, this approach may seem not very suitable for distributed sys- 
tems since components interested in context (subscribers) are distributed over 
different computing devices and developed by different programmers. This in- 
compatibility leads to different interpretations of the same context data. For 
example, a user’s location may be interpreted by one component as a relative 
distance (near, far) and by another component as an absolute location (using 
the coordinates). We argue that even if the storage medium is centralized, the 
interpretation of the selected entries is performed in a distributed manner, at 
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the component level. In addition, the security bucket acts as a service that has 
the ability to retrieve a given contextual entry when requested. Thus, multiple 
security buckets with the same functionality may exist in the network in order 
to ensure availability and load balancing. 

The main advantage of this approach is to ease the scale up of the system 
for a large number of contextual entries across a wide network. The second 
advantage is the robustness to failures by making the contextual data available 
from different places, and finally, to ease the protection of contextual data. 

The security context bucket offers the same advantages as encapsulation; a 
key feature in the object-oriented paradigm. Object encapsulation is also known 
as data-hiding. It is a software mechanism that protects code and data from 
being accessed by everyone but only to the methods that need it. In the same 
manner, the context bucket hides its content, and access to it (read and write 
operations) is subject to a security policy that manages interactions with clients. 

Context entries are collected from the distributed network by the mean of a 
group of agents. Gathering agents are mobile and launched by the security 
bucket when requested. Their main role is to collect needed contextual data from 
their remote location, by requesting sensors, software applications and environ- 
ment. The content of the context bucket is primordial in configuring the security 
policy of the system. This content must then be protected. 



Protection and Privacy of the Security Context Bucket. Designing 
context-based security systems poses a kind of tricky issue. The more a context- 
based security system knows about the user’s and the application environment, 
the more it can provide fine-grained access control to protected resources. On 
the other hand, it becomes easier for hackers (at least theoretically) to compro- 
mise the security of the system not directly (by attacking resources) but may 
do it indirectly by providing false contextual data to the bucket or by accessing 
critical users information contained in it. The first can be achieved by launch- 
ing malicious gathering agents that provide corrupt data and the second can 
be achieved by accessing critical information from the context bucket, such as 
users’ private data. 

Thus, and in order to achieve protection and privacy of the security context, 
an additional component is then required in our architecture. The authentication 
module authenticates both entities that provide contextual entries (gathering 
agents) and entities that need access to the security context (context engines). 

However, in case of the unavailability of a contextual entry, the system must 
be able to learn from previous experiences and propose an alternative. 

Collected contextual information are used by the context engine (described 
in section EHJ in order to build a security context and then to deduce the actions 
to perform. The security bucket requirements are summarized in the following: 

— The security context bucket has the ability to create, manage and authenti- 
cate gathering agents, 

— Contextual entries are sensed and stored in a primitive format that eases all 
possible interpretations. 
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Fig. 3. The security context bucket 



— Only authorized sources (gathering agents) are able to add/update data in 
the security context bucket, 

— Only authorized destinations (security context engines) are able to read the 
bucket content, 

— The security context bucket must maintain historical information at the 
finest level of detail possible of its content. 

Figure ^illustrates the main parts of the security context bucket: 

1. The agents’ factory produces gathering agents in order to collect contextual 
data upon request, 

2. The contextual data repository is used to store gathered contextual data, 

3. An authentication module is also needed in order to authenticate gathering 
agents and requesting context engines. 

The following figure (Fig. EJ illustrates the overall structure of the framework 
and the relationships among the different components. Further changes in the 
access policy of protected resources can be transparently performed by updating 
the corresponding policy in the context engine. Resource can join or leave the 
distributed infrastructure without disturbing the security infrastructure. 
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Fig. 4. Overall architecture 

5 Conclusion and Future Work 

Security systems for distributed infrastructure are generally bound to static ac- 
cess policies that make them very difficult to adapt to new threats. This situa- 
tion is due to the lack of consideration for context in existing security systems. 
Context-based security is a recent research direction that aims at providing flex- 
ible security models for distributed infrastructures, where the user’s and appli- 
cation environments are continually changing. 

In this paper we have introduced a new model for context-based authoriza- 
tions tuning in distributed systems. Much of this work is focused on providing 
a generic minimal architecture based on loosely-coupled components. The ar- 
chitecture provides tools for collecting and modelling security contextual data. 
We have introduced the concept of partial context and illustrate how it can be 
used to request specific authentication methods in order to control access to 
protected resources. In the near future, we intend to extend the proposed frame- 
work to handle inaccurate or unavailable contextual data, specify a registration 
protocol that allows adding context-based access policies in the context engine 
and investigate the ease of eomnlex relationships between user’s roles. The use 
of contextual graphs |r9] is also a potential methodology for modelling the 
security context; in order to access a resource, the specification of an exhaustive 
graph may prevent frauds by including only ”sa/e” cases. 

We are actually investigating a test-bed application with a federation of services 
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inside a university network. Both tutors and students provide plug and play 
services to be used in a safe environment. These services (or resources) include: 
course subscription services, online exercises, chat systems, printing service, etc. 
Our model provides an administrative tool to manage authorizations on these 
resources based on the user’s role (regular student, auditor, tutor or guest) and 
the context of interaction (time, day, history of the user’s use of the service, etc). 
We believe that this approach will prove to be an interesting starting point for 
further investigations of flexible security models for next-generation distributed 
authorizations frameworks. 
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Abstract. The knowledge representation tradition in computational lexicon de- 
sign represents words as static encapsulations of purely lexical knowledge. We 
suggest that this view poses certain limitations on the ability of the lexicon to 
generate nuance-laden and context-sensitive meanings, because word boundaries 
are obstructive, and the impact of non-lexical knowledge on meaning is unac- 
counted for. Hoping to address these problematics, we explore a context- 
centered approach to lexicon design called a Bubble Lexicon. Inspired by Ross 
Quillian’s Semantic Memory System, we represent word-concepts as nodes on a 
symbolic-connectionist network. In a Bubble Lexicon, a word’s meaning is de- 
fined by a dynamically grown context-sensitive bubble; thus giving a more natu- 
ral account of systematic polysemy. Linguistic assembly tasks such as attribute 
attachment are made context-sensitive, and the incorporation of general world 
knowledge improves generative capability. Indicative trials over an implemen- 
tation of the Bubble Lexicon lends support to our hypothesis that unpacking 
meaning from predefined word structures is a step toward a more natural han- 
dling of context in language. 



1 Motivation 

Packing meaning (semantic knowledge) into words (lexical items) has long been the 
knowledge representation tradition of lexical semantics. However, as the field of 
computational semantics becomes more mature, certain problematics of this paradigm 
are beginning to reveal themselves. Words, when computed as discrete and static 
encapsulations of meaning, cannot easily generate the range of nuance-laden and con- 
text-sensitive meanings that the human language faculty seems able to produce so 
effortlessly. Take one example: Miller and Fellbaum’s popular machine-readable 
lexicon, WordNet [7], packages a small amount of dictionary-type knowledge into 
each word sense, which represents a specific meaning of a word. Word senses are 
partitioned a priori, and the lexicon does not provide an account of how senses are 
determined or how they may be systematically related, a phenomenon known as sys- 
tematic polysemy. The result is a sometimes arbitrary partitioning of word meaning. 
For example, the WordNet entry for the noun form of “sleep” returns two senses, one 
which means “a slumber” (i.e. a long rest), and the other which means “a nap” (i.e. a 
brief rest). The systematic relation between these two senses is unaccounted for, and 
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their classification as separate senses indistinguishable from homonyms give the false 
impression that there is a no-man’s land of meaning in between each predefined word 
sense. 

Hoping to address the inflexibility of lexicons like WordNet, Pustejovsky’s Gen- 
erative Lexicon Theory (GLT) [19] packs a great deal more meaning into a word 
entity, including knowledge about how a word participates in various semantic roles 
known as “qualia,” which dates back to Aristotle. The hope is that a densely packed 
word-entity will be able to generate a fuller range of nuance-laden meaning. In this 
model, the generative ability of a word is a function of the type and quantity of knowl- 
edge encoded inside that word. For example, the lexical compound “good rock” only 
makes sense because one of the functions encoded into “rock” is “to climb on,” and 
associated with “to climb on” is some notion of “goodness.” GLT improves upon the 
sophistication of previous models; however, as with previous models, GLT represents 
words as discrete and pre-defined packages of meaning. We argue that this underlying 
word-as-prepackaged-meaning paradigm poses certain limitations on the generative 
power of the lexicon. We describe two problematics below: 

1) Artificial word boundary. By representing words as discrete objects with pre- 
defined meaning boundaries, lexicon designers must make a priori and sometimes 
arbitrary decisions about how to partition word senses, what knowledge to encode 
into a word, and what to leave out. This is problematic because it would not be 
feasible (or efficient) to pack into a word all the knowledge that would be needed 
to anticipate all possible intended meanings of that word. 

2) Exclusion of non-lexical knowledge. When representing a word as a predeter- 
mined, static encapsulation of meaning, it is common practice to encode only 
knowledge that formally characterizes the word, namely, lexical knowledge (e.g. 
the qualia structure of GLT). We suggest that non-lexical knowledge such as gen- 
eral world knowledge also shapes the generative power and meaning of words. 
General world knowledge differs from lexical knowledge in at least two ways: 

a) First, general world knowledge is largely concerned with defeasible knowl- 
edge, describing relationships between concepts that can hold true or often 
holds true (connotative). By comparison, lexical knowledge is usually a more 
formal characterization of a word and therefore describes relationships between 
concepts that usually holds true (denotative). But the generative power of 
words and richness of natural language may lie in defeasible knowledge. For 
example, in interpreting the phrase “funny punch, ” it is helpful to know that 
‘fruit punch can sometimes be spiked with alcohol. ” Defeasible knowledge is 
largely missing from WordNet, which knows that a “cat" is a “feline", “carni- 
vore", and “mammal" , but does not know that “a cat is often a pet." While 
some defeasible knowledge has crept into the qualia structures of GLT (e.g. “a 
rock is often used to climb on”), most defeasible knowledge does not naturally 
fit into any of GLT’s lexically oriented qualia roles. 

b) Second, lexical knowledge by its nature characterizes only word-level con- 
cepts (e.g. “kick"), whereas general world knowledge characterizes both word- 
level and higher-order concepts (e.g. “kick someone"). Higher-order con- 
cepts can also add meaning to the word-level concepts. For example, knowing 
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that ‘‘kicking someone may cause them to feel pain ” lends a particular interpre- 
tation to the phrase “an evil kick. ” WordNet and GLT do not address general 
world knowledge of higher-order concepts in the lexicon. 

It is useful to think of the aforementioned problematics as issues of context. Word 
boundaries seem artificial because meaning lies either wholly inside the context of a 
word, or wholly outside. Non-lexical knowledge, defeasible and sometimes charac- 
terizing higher-order concepts, represents a context of connotation about a word, 
which serves to nuance the interpretation of words and lexical compounds. Consider- 
ing these factors together, we suggest that a major weakness of the word-as- 
prepackaged-meaning paradigm lies in its inability to handle context gracefully. 

Having posed the problematics of the word-as-prepackaged-meaning paradigm as 
an issue of context, we wonder how we might model the computational lexicon so that 
meaning contexts are more seamless and non-lexical knowledge participates in the 
meaning of words. We recognize that this is a difficult proposition with a scope ex- 
tending beyond just lexicon design. The principle of modularity in computational 
structures has been so successful because encapsulations like frames and objects help 
researchers manage complexity when modeling problems. Removing word bounda- 
ries from the lexicon necessarily increases the complexity of the system. This notwith- 
standing, we adopt an experimental spirit and press on. 

In this paper, we propose a context-centered model of the computational lexicon in- 
spired by Ross Quillian’s work on semantic memory [21], which we dub as a Bubble 
Lexicon. The Bubble Lexicon Architecture (BLA) is a symbolic connectionist net- 
work whose representation of meaning is distributed over nodes and edges. Nodes are 
labeled with a word-concept (our scheme does not consider certain classes of words 
such as, inter alia, determiners, prepositions and pronouns). Edges specify both the 
symbolic relation and connectionist strength of relation between nodes. A word- 
concept node has no internal meaning, and is simply meant as a reference point, or, 
indexical feature, (as Jackendoff would call it [9]) to which meaning is attached. 
Without formal word boundaries, the “meaning” of a word becomes the dynamically 
chosen, flexible context bubble (hence the lexicon’s name) around that word’s node. 
The size and shape of the bubble varies according to the strength of association of 
knowledge and the influence of active contexts; thus, meaning is nuanced and made 
context-sensitive. Defeasible knowledge can be represented in the graph with the help 
of the connectionist properties of the network. Non-lexical knowledge involving 
higher-order concepts (more than one word) are represented in the graph through 
special nodes called encapsulations, so that they may play a role in biasing meaning 
determination. 

The nuanceful generative capability of the BLA is demonstrated through the lin- 
guistic assembly task of attribute attachment, which engages some simulation over the 
network. For example, determining the meaning of a lexical compound such as “fast 
car” involves the generation of possible interpretations of how the “fast” and “car” 
nodes are conceptually related through dependency paths, followed by a valuation of 
each generated interpretation with regard to its structural plausibility and contextual 
plausibility. The proposed Bubble Lexicon is not being presented here as a perfect or 
complete solution to computational lexicon design, but rather, as the implementation 
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and indicative trials illustrate, we hope Bubble Lexicon is a step toward a more ele- 
gant solution to the problem of context in language. 

The organization of the rest of this paper is as follows. First, we present a more 
detailed overview of the Bubble Lexicon Architecture, situating the representation in 
the literature. Second, we present mechanisms associated with this lexicon, such as 
context-sensitive interpretation of words and compounds. Third, we discuss an im- 
plementation of Bubble Lexicon and present some evaluation for the work through 
some indicative trials. Fourth, we briefly review related work. In our conclusion we 
return to revisit the bigger picture of the mental lexicon. 



2 Bubble Lexicon Architecture 

This section introduces the proposed Bubble Lexicon Architecture (BLA) (Fig. 1) 
through several subsections. We begin by situating the lexicon’s knowledge represen- 
tation in the literature of symbolic connectionist networks. Next, we enumerate some 
tenets and assumptions of the proposed architecture. Finally, we discuss the ontology 
of types for nodes, relations, and operators. 




Fig. 1. A static snapshot of a Bubble Lexicon. We selectively depict some nodes and edges 
relevant to the lexical items “car”, “road”, and “fast”. Edge weights are not shown. Nodes 
cleaved in half are causal trans-nodes. The black nodes are context-activation nodes. 
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2.1 Knowledge Representation Considerations 

A Bubble Lexicon is represented by a symbolic-connectionist network specially pur- 
posed to serve as a computational lexicon. Nodes function as indices for words, lexi- 
cal compounds (linguistic units larger than words, such as phrases), and formal con- 
texts (e.g. a discourse topic). Edges are labeled dually with a minimal set of structural 
structural dependency relations to describe the relationships between nodes, and with a 
numerical weight. Operators are special relations which can hold between nodes, 
between edges, and between operator relations themselves; they introduce boolean 
logic and the notion of ordering, which is necessary to represent certain types of 
knowledge (e.g. ordering is needed to represent a sequence of events). 

Because the meaning representation is distributed over the nodes and edges, words 
only have an interpretive meaning, arising out of some simulation of the graph. 
Spreading activation (cf. [5]) is ordinarily used in semantic networks to determine 
semantic proximity. We employ a version of spreading activation to dynamically 
create a context bubble of interpretive meaning for a word or lexical compound. In 
growing and shaping the bubble, our spreading activation algorithm tries to model the 
influence of active contexts (such as discourse topic), and of relevant non-lexical 
knowledge, both of which contribute to meaning. 

Some properties of the representation are further discussed below. 

Connectionist weights. Connectionism and lexicon design are not usually considered 
together because weights tend to introduce significant complexity to the lexicon. 
However, there are several reasons why connectionism is necessary to gracefully 
model the context problem in the lexicon. 

First, not all knowledge contributes equally to a word’s meaning, so we need nu- 
merical weights on edges as an indication of semantic relevance, and to distinguish 
between certain from defeasible knowledge. Defeasible knowledge may in most cases 
be less central to a word’s meaning, but in certain contexts, their influence is felt. 

Second, connectionist weights lend the semantic network notions of memory and 
learning, exemplified in [16], [17], and [22]. For the purposes of growing a computa- 
tional lexicon, it may be desirable to perform supervised training on the lexicon to 
learn particular meaning bubbles for words, under certain contexts. Learning can also 
be useful when importing existing lexicons into a Bubble Lexicon through an exposure 
process similar to semantic priming [1]. 

Third, connectionism gives the graph intrinsic semantics, meaning that even with- 
out symbolic labels on nodes and edges, the graded inter-connectedness of nodes is 
meaningful. This is useful in conceptual analogy over Bubble Lexicons. Goldstone 
and Rogosky [8] have demonstrated that it is possible to identify conceptual corre- 
spondences across two connectionist webs without symbolic identity. If we are also 
given symbolic labels on relations, as we are in BLA, the structure-mapping analogy- 
making methodology described by Falkenhainer et al. [6] becomes possible. 

Finally, although not the focus of this paper, a self-organizing connectionist lexicon 
would help to support lexicon evolution tasks such as lexical acquisition (new word 
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meanings), generalization (merging meanings), and individuation (cleaving mean- 
ings). A discussion of this appears elsewhere [11]. 

Ontology of Conceptual Dependency Relations. In a Bubble Lexicon, edges are 
relations which hold between word, compound, and context nodes. In addition to 
having a numerical weight as discussed above, edges also have a symbolic label repre- 
senting a dependency relation between the two words/concepts. The choice of the 
relational ontology represents an important tradeoff. Very relaxed ontologies that 
allow for arbitrary predicates like bite (dog, mailman) in Peirce’s existential 
graphs [18] or node-specific predicates as in Brachman’s description logics system [2] 
are not suitable for highly generalized reasoning. Efforts to engineer ontologies that 
enumerate a priori a complete set of primitive semantic relations, such as Ceccato’s 
correlational nets [3], Masterman's primitive concept types [14], and Schank’s Con- 
ceptual Dependency [23], show little agreement and are difficult to engineer. A small 
but insufficiently generic set of relations such as WordNet’s nyms [7] could also se- 
verely curtail the expressive power of the lexicon. 

Because lexicons emphasize words, we want to focus meaning around the word- 
concept nodes rather than on the edges. Thus we propose a small ontology of generic 
structural relations for the BLA. For example, instead of grow (tree, fast) , we 
have ability (tree , grow) and parameter (grow, fast) . These relations 
are meant as a more expressive set of those found in Quillian’s original Semantic 
Memory System. These structural relations become useful to linguistic assembly tasks 
when building larger compound expressions from lexical items. They can be thought 
of as a sort of semantic grammar, dictating how concepts can assemble. 

2.2 Tenets and Assumptions 

Tenets. While the static graph of the BLA (Fig. 1) depicts the meaning representa- 
tion, it is equally important to talk about the simulations over the graph, which are 
responsible for meaning determination. We give two tenets below: 

1) No coherent meaning without simulation. In the Bubble Lexicon graph, different 
and possibly conflicting meanings can attach to each word-concept node; therefore, 
words hardly have any coherent meaning in the static view. We suggest that when 
human minds think about what a word or phrase means, meaning is always evaluated 
in some context. Similarly, a word only becomes coherently meaningful in a bubble 
lexicon as a result of simulation (graph traversal) via spreading activation (edges are 
weighted, though Fig. 1 does not show the weights) from the origin node, toward some 
destination. This helps to exclude meaning attachments which are irrelevant in the 
current context, to hammer down a more coherent meaning. 

2) Activated nodes in the context biases interpretation. The meaning of a word or 
phrase is the collection of nodes and relations it has “harvested” along the path toward 
its destination. However, there may be multiple paths representing different interpre- 
tations, perhaps each representing one “word sense”. In BLA, the relevance of each 
word sense path depends upon context biases near the path which may boost the acti- 
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vation energy of that path. Thus meaning is naturally influenced by context, as con- 
text nodes prefers certain interpretations by activating certain paths. 

Assumptions. We have made the following assumptions about our representation: 

1) Nodes in BLA are word-concepts. We do not give any account of words like 
determiners, pronouns, and prepositions. 

2) Nodes may also be higher-order concepts like “fast car,” constructed through en- 
capsulation. In lexical evolution, intermediate transient nodes also exist. 

3) In our examples, we show selected nodes and edges, although the success of such a 
lexicon design thrives on the network being sufficiently well-connected and dense. 

4) Homonyms, which are non-systematic word senses (e.g. fast: not eat, vs. quick) are 
represented by different nodes. Only systematic polysemy shares the same node. We 
assume we can cleanly distinguish between these two classes of word senses. 

5) Though not shown, relations are always numerically weighted between 0.0 and 1.0, 
in addition to the predicate label, and nodes also have a stable activation energy, 
which is a function of how often active a node is within the current discourse. 



2.3 Ontology of Nodes, Relations, and Operators 



We propose three types of nodes (Fig. 2). Normal nodes may be word-concepts, or 
larger encapsulated lexical expressions. However, some kinds of meaning i.e. actions, 
beliefs, implications are difficult to represent because they have some notion of syn- 
tax. Some semantic networks have overcome this problem by introducing a causal 
relation [22], [17]. We opted for a causal node called a TransNode because we feel 
that it offers a more precise account of causality as being inherent in some word- 
concepts, like actions. This also allows us to maintain a generic structural relation 
ontology. Because meaning determination is dynamic, TransNodes behave causally 
during simulation. TransNodes derive from Minsky’s general interpretation [15] of 
Schankian transfer [23], and is explained more fully elsewhere [11]. 
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Fig. 2. Ontology of node, relation, and operator types. 



While normal nodes can act as contexts when they are activated in the BLA, there is 
no formal definition to those groupings. We suggest that sometimes, human minds 
may employ more formal and explicit notions of context which define a topic or do- 
main of discourse (e.g.: “automotives,” “finance”). For example, the meaning of the 
formal context “finance” is somewhat different than the meaning that is attached to 
that word-concept node. For one, the formal context “finance” may be a well-defined 
term in the financial community. The external definition of certain concepts like for- 
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mal contexts is supported by Putnam’s idea of semantic externalism [20]. We intro- 
duce ContextNodes as an explicit representation of externally-defined formal con- 
texts. ContextNodes use the assoc (generic association) relation, along with operators, 
to cause the network to be in some state when they are activated. They can be thought 
of as externally grounded contexts. Meta-level ContextNodes that control a layer of 
ContextNodes are also possible. In Figure 1, the “formal auto context” ContextNode is 
meant to represent formally the domain of automotives, to the best of a person’s un- 
derstanding of the community definition of that context. 

Because ContextNodes help to group and organize nodes, they are also useful in 
representing perspectives, just as a semantic frame might. Let us consider again the 
example of a car, as depicted in Figure 1 . A car can be thought of as an assembly of 
its individual parts, or it can be thought of functionally as something that is a type of 
transportation that people use to drive from point A to point B. We can use Con- 
textNodes to separate these two perspectives of a car. After all, we can view a per- 
spective as a type of packaged context. 

So far we have only talked about nodes which are stable word-concepts and stable 
contexts in the lexicon. These can be thought of as being stable in memory, and 
changing slowly. However, it is also desirable to represent more temporary concepts, 
such as those used in thought. For example, to reason about “fast cars”, one might 
encapsulate one particular sense path of fast car into a TransientNode. Or one can 
instantiate a concept and overload its meaning. TransientNodes explain how fleeting 
concepts in thought can be reconciled with the lexicon, which contains more stable 
elements. The interaction of concepts and ideas constructed out of them should not be 
a strange idea because in the human mental model, there is no line drawn between 
them. In the next section we illustrate the instantiation of a TransientNode. 

We present a small ontology of structural relations to represent fairly generic 
structural relations between concepts. Object-oriented programming notation is useful 
shorthand because the process of meaning determination in the network engages in 
structural marker passing of relations, where symbol binding occurs. It is also im- 
portant to remember, that each edge carries not only a named relation, but also a nu- 
merical weight. Connectionist weights are critically important in all processes of 
Bubble Lexicons, especially spreading activation and learning. 

Operators put certain conditions on relations. In Figure I, road material may only 
take on the value of pavement or dirt, and not both at once. Some operators will only 
hold in a certain instantiation or a certain context; so operators can be conditionally 
activated by a context or node. For example, a person can drive and walk, but under 
the time context, a person can only drive XOR walk. 

3 Bubble Lexicon Mechanisms 

We now explain the processes that are core themes of the Bubble Lexicon. 

Meaning Determination. One of the important tenets of the lexicon’s representation 
in Bubble Lexicons is that coherent meaning can only arise out of simulation. That is 
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to say, out-of-context, word-concepts have so many possible meanings associated with 
each of them that we can only hope to make sense of a word by putting it into some 
context, be it a formal topic area (e.g. traversing from “car” toward the ContextNode 
of “transportation”) or lexical context (e.g. traversing from “car” toward the normal 
node of “fast”). We motivate this meaning as simulation idea with the example of 
attribute attachment for “fast car”, as depicted in Figure 1. Figure 3 shows some of 
the different interpretations generated for “fast car”. 

As illustrated in Figure 3, “fast car” produces many different interpretations given 
no other context. Novel to Bubble Lexicons, not only are numerical weights passed, 
structural messages are also passed. For example, in Figure 1, “drying time” will not 
always relate to “fast” in the same sense. It depends on whether or not pavement is 
drying or a washed car is drying. Therefore, the history of traversal functions to nu- 
ance the meaning of the current node. Unlike Charniak’s earlier notion of marker 
passing [4] used to mark paths, structural marker passing in Bubble Lexicons is accre- 
tive, meaning that each node contributes to the message being passed. 



^ — a h(c)— a b(c-d) 

— (a.b).c-» ^atin^ {(ab)c)=d — 

— a>*(paim^ — (a.b).d-» ^it^^ (la.b).c)^d — 

— b a()^^ b.a(c}— b.a(cad) 



The car whose top speed is fast 

(b) The car that can be driven ai a speed that is fast. 

(c) The car whose tires have a rating that is fast. 

(d) The car whose paint has a drying time that is fast. 

(e) The car that can be washed at a speed that is fast. 



(f) The car that can be driver or a road whose speed limit is fast. 



(g) The car that can be driven 
on a road whose road material 
is pavement, whose drying time 
is fast. 



Fig. 3. Different meanings of “fast car,” resulting from network traversal. Numerical weights 
and other context nodes are not shown. Edges are labeled with message passing, in OOP nota- 
tion. The i“’ letter corresponds to the i"'node in a traversal. 



Although graph traversal produces many meanings for “fast car,” most of the senses 
will not be very energetic, that is to say, they are not very plausible in most contexts. 
The senses given in Figure 3 are ordered by plausibility. Plausibility is determined by 
the activation energy of the traversal path. Spreading activation across a traversal path 
is different than classical spreading activation from the literature. 

active 

j j contexts 

W Ay. = S S (2) 

n=i n=i c 



Equation (1) shows how a typical activation energy for the xth path between nodes i 
and j is calculated in classical spreading activation systems. It is the summation over 
all nodes in the path, of the product of the activation energy of each node n along the 
path, times the magnitude of the edge weight leading into node n. However, in a Bub- 
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ble Lexicon, we would like to make use of extra information to arrive at a more pre- 
cise evaluation of a path’s activation energy, especially against all other paths between 
i and j. This can be thought of as meaning disambiguation, because in the end, we 
inhibit the incorrect paths which represent incorrect interpretations. 

To perform this disambiguation, the influence of contexts that are active (i.e. other 
parts of the lexical expression, relevant and active non-lexical knowledge, discourse 
context, and topic ContextNodes), and the plausibility of the structural message being 
passed, are factored in. If we are evaluating a traversal path in a larger context, such 
as a part of a sentence or larger discourse structure, or some topic is active, then there 
will likely be a set of word-concept nodes and ContextNodes which have remained 
active. These contexts are factored into our spreading activation valuation function (2) 
as the sum over all active contexts c of all paths from c to n. 

The plausibility of the structural message being passed is also important. 

Admittedly, for different linguistic assembly tasks, different heuristics will be needed. 
In attribute attachment (e.g. adj-noun compounds), the heuristic is fairly straightfor- 
ward: The case in which the attribute characterizes the noun-concept directly is pre- 
ferred, followed by the adjective characterizing the noun-concept’s ability or use (e.g. 
Fig. 3(b)) or subpart (e.g. Fig. 3(a,c,d)), followed by the adjective characterizing some 
external manipulation of the noun-concept (e.g. Fig. 3(e)). What is not preferred is 
when the adjective characterizes another noun-concept that is a sister concept (e.g. 
Fig. 3(f,g)). Our spreading activation function (2) incorporates classic spreading acti- 
vation considerations of node activation energy and edge weight, with context influ- 
ence on every node in the path, and structural plausibility. 

Recall that the plausibility ordering given in Figure 3 assumed no major active 
contexts. However, let’s consider how the interpretation might change had the dis- 
course context been a conversation at a car wash. In such a case, “car wash’’ might be 
an active ContextNode. So the meaning depicted in Fig. 3(e) would experience in- 
creased activation energy from the context term, ^"car-wash" wash ■ boost makes 
(e) a plausible, if not the preferred, interpretation. 

• fast 




Fig. 4. Encapsulation. One meaning of “fast car” is encapsulated into a TransientNode, making 
it easy to reference and overload. 

Encapsulation. Once a specific meaning is determined for a lexical compound, it 
may be desirable to refer to it, so, we assign to it a new index. This happens through a 
process called encapsulation, in which a specific traversal of the network is captured 
into a new TransientNode. (Of course, if the node is used enough, over time, it may 
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become a stable node). The new node inherits just the specific relations present in the 
nodes along the traversal path. Figure 4 illustrates sense (b) of “fast car”. 

More than just lexical compounds can be encapsulated. For example, groupings of 
concepts (such as a group of specific cars) can be encapsulated, along with objects that 
share a set of properties or descriptive features (Jackendoff calls these kinds [9]), and 
even assertions and whole lines of reasoning can be encapsulated (with the help of the 
Boolean and ordering operators). And encapsulation is more than just a useful way of 
abstraction-making. Once a concept has been encapsulated, its meaning can be over- 
loaded, evolving away from the original meaning. For example, we might instantiate 
“car” into “Mary’s car,” and then add a set of properties specific to Mary’s car. We 
believe encapsulation, along with classical weight learning, supports accounts of lexi- 
cal evolution, namely, it helps to explain how new concepts may be acquired, concepts 
may be generalized (concept intersection), or individuated (concept overloading). 
Lexical evolution mechanisms are discussed elsewhere [11]. 

Importing Existing Knowledge into the Bubble Lexicon. One question which may 
be looming in the reader’s mind is how a Bubble Lexicon might be practically con- 
structed. One practical solution is to bootstrap the network by learning frame knowl- 
edge from existing lexicons, such as GLT, or even Cyc [10], a database of lexical and 
non-lexical world knowledge. Taking the example of Cyc, we might map Cyc con- 
tainers into nodes, predicates into TransNodes, and map micro-theories (Cyc’s version 
of contexts) into ContextNodes, which activate concepts within each micro-theory. 
Assertional knowledge can be encapsulated into new nodes. To learn the intrinsic 
weights on edges, supervised learning can be used to semantically prime the network 
to the knowledge being imported. Cyc suffers from the problem of rigidity, especially 
contextual rigidity, as exhibited by microtheories which pre-fix context boundaries. 
However, once imported into a Bubble Lexicon, meaning determination may become 
more dynamic and context-sensitive. Contexts will evolve, based on the notion of 
lexical utility, not just on predefinition. 



4 Implementation 

To test the ideas put forth in this paper, we implemented a Bubble Lexicon over a 
adapted subset of the Open Mind Commonsense Semantic Network (OMCSNet) [13] 
based on the Open Mind Commonsense knowledge base [24]. We use the adaptive 
weight training algorithm developed for a Commonsense Robust Inference System 
(CRIS) [12]. OMCSNet is a large-scale semantic network of 140,000 items of general 
world knowledge including lexical and non-lexical, certain and defeasible. Its scale 
provides BLA with a rich basis from which meaning can be drawn. 

With the goal of running trials, edge weights were assigned an a priori fixed value, 
based on the type of relation. The spreading activation evaluation function described 
in equation (2) was implemented. We also labeled three existing nodes in OMCSNet 
as ContextNodes and translated the nodes’ hasCollocate relations, into the assoc rela- 
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tion in the Bubble Lexicon. Predefining nodes, while not generally necessary, was 
done in this case to make it easier to observe the effects of context bias in indicative 
trials. Trials were run over four lexical compounds, alternatingly activating each of 
these ContextNodes plus the null ContextNode. Context activations were set to a very 
high value to exaggerate, for illustrative purposes, the effect of context on meaning 
determination. Table 1 summarizes the results. 

One discrepancy between the proposed and implemented systems is that assertional 
knowledge (e.g. “Gas money to work can be cheap”) in the implementation is allowed 
to be in the traversal path. Assertional knowledge is encapsulated as nodes. 

The creative and nuanceful interpretations produced by the BLA demonstrate 
clearly the effects of active context on meaning determination. The incorporation of 
non-lexical knowledge into the phrasal meaning is visible (e.g. “Horse that races, 
which wins money, is fast"). By comparison, WordNet and GLT would not have 
produced the varied and informal interpretations produced by BLA. 



Table 1. Results of trials illustrate effects of active context on attribute attachment. 



Compound (context) 


Top Interpretation ( A • ■ score in %) 

" X 


Fast horse ( ) 


Horse that is fast. (30%) 


Fast horse (money) 


Horse that races, which wins money, is fast. (60%) 


Fast horse (culture) 


Horse that is fast (30%) 


Fast horse (transportation) 


Horse is used to ride, which can be fast. (55%) 


Cheap apartment ( ) 


Apartment that has a cost which can be cheap. (22%) 


Cheap apartment (money) 


Apartment that has a cost which can be cheap. (80%) 


Cheap apartment (culture) 


Apartment is used for living, which is cheap in New York. (60%) 


Cheap apartment (transportation) 


Apartment that is near work; Gas money to work can be cheap (20%) 


Love tree ( ) 


Tree is a part of nature, which can be loved (15%) 


Love tree (money) 


Buying a tree costs money; money is loved. (25%) 


Love tree (culture) 


People who are in love kiss under a tree. (25%) 


Love tree (transportation) 


Tree is a part of nature, which can be loved (20%) 


Talk music ( ) 


Music is a language which has use to talk. (30%) 


Talk music (money) 


Music is used for advertisement, which is an ability of talk radio. (22%) 


Talk music (culture) 


Music that is classical is talked about by people. (30%) 


Talk music (transportation) 


Music is used in elevators where people talk. (30%) 



However, the implementation also reveals some difficulties associated with the BLA. 
Meaning interpretation is very sensitive to the quality and signal-to-noise ratio of 
concepts/relations/knowledge present in the lexicon, which in our case, amounts to 
knowledge present in OMCSNet. For example, in the last example in Table 1, “talk 
music” in the transportation context was interpreted as “music is used in elevators, 
where people talk.” This interpretation singled out elevators, even though music is 
played in buses, cars, planes, and elsewhere in transportation. This has to do with the 
sparseness of relations in OMCSNet. Although those other transportation concepts 
existed, they were not properly connected to “music”. What this suggests is that 
meaning is not only influenced by what exists in the network, it is also heavily influ- 
enced by what is absent, such as the absence of a relation that should exist. 

Also, judging the relevance of meaning relies largely on the evolution of good nu- 
merical weights on edges; but admittedly, learning the proper weights is a difficult 
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proposition: Though we point out that even a rough estimate of weights (for example, 
separating lexical and non-lexical knowledge by 0.5), as was employed in our imple- 
mentation, vastly improved the performance of meaning determination. 

Though the complexity and knowledge requirements remain lingering challenges 
for the BLA, the implementation and indicative trials do seem to support our hypothe- 
sis that unpacking meaning from predefined word structures is a step toward a more 
natural handling of nuance and context in language. 



5 Related Work 

Ross Quillian’s Semantic Memory System [21] was the initial inspiration for this 
work, as it was one of the first to explore meaning being distributed over a graph. In 
the semantic memory system, Quillian sought to demonstrate some basic semantic 
capabilities over a network of word-concepts, namely, comparing and contrasting 
words. The relations initially proposed represented minimal structural dependencies, 
only later to be augmented with some other relations including proximity, conse- 
quence, precedence, and similarity. The type of knowledge represented in the network 
was denotative and dictionary-like. With the Bubble Lexicon, we attempt to build on 
Quillian’s work. We explain how such a semantic memory might be used to circum- 
vent the limitations of traditional lexicons. We populate the network with lexical and 
non-lexical knowledge, and demonstrate their influences on meaning. We give an 
account of context-sensitive meaning determination by modifying spreading activation 
to account for contextual and structural plausibility; and introduce connectionism as a 
vehicle for conceptual analogy and learning. 



6 Conclusion 

In this paper, we examined certain limitations that the word-as-prepackaged-meaning 
paradigm imposes on the ability of the lexicon to generated highly nuanced interpreta- 
tions. We formulated these problematics as issues of context, and hypothesized that a 
context-centered design of the computational lexicon would lend itself more to nu- 
anced generation. We proposed a context-sensitive symbolic-connectionist network 
called a Bubble Lexicon. Rather than representing words as static encapsulations of 
meaning, the Bubble Lexicon dynamically generates context bubbles of meaning 
which vary based on active contexts. The inclusion of non-lexical knowledge such as 
defeasible and higher-order conceptual knowledge, along with intrinsic weights on 
relations, all serve to nuance to meaning determination. More than a static structure, 
the Bubble Lexicon is a platform for performing nuanceful linguistic assembly tasks 
such as context-sensitive attribute attachment (e.g. “fast car”). 

An implementation of the Bubble Lexicon over a large repository of commonsense 
called OMCSNet yielded some promising findings. In indicative trials, context had a 
very clear effect in nuancing the interpretation of phrases, lending support to our hy- 
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pothesis. However, these findings also tell a cautionary tale. The accuracy of a se- 
mantic interpretation is heavily reliant on the concepts in the network being well- 
connected and densely packed, and the numerical weights being properly learned. The 
task of building lexicons over symbolic connectionist networks will necessarily have 
to meet these needs and manage a great deal of complexity. However, we are optimis- 
tic that the large repositories of world knowledge being gathered recently will serve as 
a well-populated foundation for such a lexicon. The research described in this paper 
explores lexicons that approach the generative power of the human language faculty. 
We cannot help but note that as such a lexicon theory grows toward its goal, it also 
approaches a comprehensive model of thought and semantic memory. 

Acknowledgements. We are indebited to the following individuals for their helpful 
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A Contextual Approach to the Logic of Fiction 



Rolf Nossum 
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Abstract. An algebraic variant of multi-context logic is considered as 
an alternative to existing logical accounts of fictional discourse. An as- 
sociative and idempotent operator on reified fictions supercedes Woods’ 
olim modality. Soundness and completeness results are obtained for cer- 
tain inter-fictional deductive rules relative to semantical conditions which 
respect the ’authorial say-so’ criterion of fictional truth. 



1 Introduction 

Fictional discourse has been a research programme for logicians at least since 
John Woods’ seminal investigations more than a quarter of a century ago 
[ ^VoofiHIWooTdJ . The problems encountered when attempting to model sentences 
of fiction are very tough, and to this day the field lacks a robust and widely ac- 
cepted logical foundation. This is, however, not for lack of vigorous efforts by 
many researchers during the intervening years, cfr i.a. \cmn . IBB71 . lUevVdI . 
[ Far75J . IbeaYbi . IffowYHl . IlLewYfjj , liUarYfjl , llUasYUl , lUabYUll , jlielTOj , IrouYUI , 
luav^al , |L()94IJ . ivlUUl . 

Giving a logical account of fiction poses more problems than analyzing dis- 
course about what is not the case in the real world. A logic of fiction must tackle 
thorny issues of representation and existence, simultaneous real and fictional 
reference, consistency, and nesting, to name a few. 

The aim of this paper is to present a logical framework which lacks some of 
the weaknesses that have been identified in existing logical accounts of fiction. 

The paper is structured as follows: In the next section we make some remarks 
intended to motivate the logical account of fiction that we shall give toward the 
end of this paper. Then we cite some intuitive benchmark properties that a 
logic of fiction should have, and review some existing systems in the literature, 
with special attention to the quantificational-substitutional account of Woods 
[ WooY4]| . Then we proceed to give our own positive account, which is an adapta- 
tion of the algebraic multi-context system of and conclude by evaluating 

it against the intuitive benchmarks. 

2 Remarks about Fiction 

Here are some general remarks about fiction, intended to illuminate central fea- 
tures of our approach: 



P. Blackburn et al. (Eds.): CONTEXT 2003, LNAI 2680, pp. 23.3- E^ 2003. 
© Springer- Verlag Berlin Heidelberg 2003 
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Fictions are typically told by someone, and are thus not given a priori. On 
the contrary, every fiction is given relative to the fiction-teller’s point of view. A 
proper logical account of fiction should refiect this. 

Fiction is not necessarily unrealistic, indeed some life-like stories have many 
readers and viewers. Vice versa, reality is itself sometimes portrayed as fiction 
(relative to some point of view); e.g. as a story (told by someone), or as a dream 
(dreamed by someone). Cfr. ”We are such stuff as dreams are made of ...” 
[EM2j Logical accounts of fiction should therefore not grant too much of a 
special status to the real world. 

Fictions can be nested inside each other, as in telling a story about an author 
writing a book. This involves several points of view; the fiction-teller’s point of 
view X, the point of view y of the author-in-the-story, and the point of view z of 
the fiction contained in the book-in-the-story. The multi-level pattern of nesting 
suggested here will be a main feature of our system. 

Stories can be mutually fictional relative to each other. In [Ra.bfif1| two stories 
develop in alternate chapters, each chapter spanning a day. Each story turns out 
to consist of the dreams dreamt at night in the other story, and the fictionological 
clou occurs when the protagonist of one story manages to kill that of the other. 
Here, the points of view /?, 7 of the protagonists of the two intertwined stories 
are related in ways that should be investigated. 

A story about a time-traveller who accidentally prevents his father from 
meeting his mother should also be accomodated within a logic of fiction. It is, 
after all, a fiction. 

Fiction frequently interacts with reality: The fictional detective Sherlock 
Holmes lived in London. This is the same non-fictional London that some of 
Conan Doyle’s readers live in, and others visit from time to time. 

Another example is Nicolas Bourbaki, a fictional mathematician whose work 
(by a collective using Bourbaki as a peudonym) is not fictional. It has had a 
great deal of infiuence on how mathematics is studied and taught in reality. 

In modelling fiction, it seems appropriate to allow elements of the fiction 
to agree with elements from the point of view of the fiction-teller. In a first 
approximation, equal names may corefer, in an elaboration there can be an 
overlap of semantic domains, and in the account we shall give later, there will 
be a mapping of the language of the fiction into that of the fiction-teller. 

It is tenable that the fiction-reader’s point of view is augmented by what 
is read, resulting in a new point of view. If this is accepted, then the resulting 
fiction must be said to encompass both the reader and the story. The point of 
view of the reader is a non-trivial part of the fiction. 

Proceeding along this path, one might be inclined to blur the distinctions 
between such categories as point of view, reality, and fiction. One person’s reality 
is just a point of view to another. Holmes’ reality is Conan Doyle’s fiction. If 
Holmes’ aide Dr. Watson tells a fairytale to his grandchildren, (Did he have any? 
Let me just say-so!) the fiction deepens further. All along, additional fictional 
components must be understood through previous points of view, or fictional 
layers. 
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On his view then, fictions are structured as sequences of fictional layers. For 
the time being, let us make the convenient assumption that adding a fictional 
layer is an associative operation. 

As a preliminary observation, it seems that the space of fictions is not tree- 
structured by the add-a-fictional-layer operation. It also seems reasonable to 
say that discourse wholly within one and the same point of view does not add 
another fictional layer, which suggests that adding a fictional layer will be an 
idempotent operation. 

3 Intuitive Benchmarks 

Perhaps the most important criterion by which to judge a logic of fiction is the 
extent to which it obeys common-sense intuitions about fictional discourse. Let 
us borrow the following five intuitive benchmarks from fWA()H |: 

A. Reference is possible to fictional beings even though they do not exist. 

B. Some sentences about fictional beings and events are true. 

C. Some inferences about fictional beings and events are correct. 

D. These three facts are made possible, in a central way, by virtue of the creative 
authority of the authors of fiction. Indeed, the primary and basic criterion 
of truth for fictional sentences is the author’s say-so. 

E. It is possible in a fictional truth to make reference to real things. For example, 
’’Sherlock Holmes lived in London” is true and refers to the actual capital 
city of Britain. 

4 Exisiting Accounts 

Existing theories of fictionality vary greatly in their shape and form, but we may 
group them into broad categories according to their main focus. 

One group of theories is represented by fSea7,5IWa.178IWalflnj with their focus 
on speech acts, authorial pretense and make-believe analyses. In reference to the 
above intuitive axioms of fictionality, this may be said to focus on axiom D. 

An ontological point of view is taken by the theories of ^barV.4UouVJ7^ ] 
These admit the Meinongian view that objecthood does not necessarily entail 
existence, thus meeting benchmark A about reference to fictional beings. 

A third group of theories distinguishes between the logical forms of sentences 
in fictional and non-fictional discourse by direct or indirect use of some kind of 
fictionality operator. Some of these take a possible- world approach IPlaY4I Kapya. 
EabVULewVHJH;;^ . while the seminal work of John Woods IWoobifW o^^ 
takes a substitutional and quantificational approach. 

4.1 Woods’ Account 

The present work aims in part to ameliorate some perceived weaknesses in 
Woods’ original framework, so we pause for a brief review of |Woo74^ by Howell, 

as quoted in mm-- 
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Fiction [is represented in] a formal language containing the olim (once- 
upon-a-time) modal operator O. He proposes, roughly speaking, that 
this language be given a say-so semantics, coupled with a substitutional 
treatment of quantification, for references to the fictional, and a nor- 
mal Tarskian semantics, with objectual quantification for references to 
the real. The (imagined) fictional claim ’’Holmes squared the circle” is 
represented by 

O (Holmes squared the circle), 

for example. This latter sentence contains but is not identical to a self- 
contradiction, and Woods notes that in affirming this sentence (and so 
in affirming the fictive claim which it represents) we are therefore not 
ourselves affirming a self-contradiction. Thus one of the problems created 
by fictional inconsistency is circumvented. Woods urges in conclusion 
that his olim language and its associated semantics, if developed in detail, 
will let us solve all the other logical and metaphysical issues about fiction. 
[ESa5fi|p.355 

Some semantical rules quoted from Woods and Alward give an im- 

pression of the technical aspects of Woods’ account: 

1. If ^ represents the usual symbolization of a sentence that occurs in 
a work of fiction, or if 4> logically follows from a consistent sentence 
of this sort, then 0{(f>) is true. 

2. If 0{Fa) is true and if {x){Fx D ^Gx) is also true (with the variable 
ranging over real objects), then so is 0{^Ga). 

5. is true iff both 0{4>) and 0{ip) are true. 

Quantification is handled as follows: 

8. If is 3viO{F), then F is true iff for every sequence S of the theory’s 
objects at least one of two conditions is met: 

i. 0{F) contains free occurrences of the variable Vi and Vi denotes 
the i-th element of some sequence s' differing from s in at most 
the i-th place, a is the name of that element and x is a substitu- 
tion instance of 0{F) with respect to a, and x meets the say-so 
condition. 

ii. If 0{F) is 0{x{vi,a)), then Vi denotes the i-th element if some 
sequence s' differing from s in at most its i-th place; that element 
knows 03vk{vk = a) to be true; the predicate x is such that in 
general \{vj,Vh) is semantically equivalent to vj believes that 

and the element denoted by vt believes that x(^j;n)- 

9. A further condition on quantifiers is if (j) is 0{3v{'ip)) then </> is true 
iff for some name or singular term o of L, free for a free variable in 
F, 0{S%{F) is true. 
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While Woods’ system was a great advance over previous semantical analy- 
ses, and has greatly stimulated and influenced the research community, it has 
also been subjected to strong criticism, as conceded in [WA02 | . from which we 
paraphrase: 

Howell and Routley independently observed that, in Woods’ 
system, inconsistent Actions narrate everything. More specifi- 
cally, if a Action admits an inconsistency, it admits any sen- 
tence at all. Formally, if 0{4>k^(j)) is in a given story, then so 
is 0{^^ip) and 0{ip) for arbitrary %p. 

Parsons observed an anomaly in the way Woods’ system han- 
dles sentences which mix Actional and real-world references. The 
requirement of epistemic intensionality in part (ii) of semantic 
condition 8 seems to be the crux of this matter. Parsons enjoy- 
ably calls the sentence 

” Some Actional detective is more famous than any real- 
world detective” 

a ” bet-sensitive, indeed winning, claim”, but observes that 
Woods’ system represents such sentences in the form 
3vO{(f){v,a)), and that nothing resembling an intensional verb 
such as required by clause 8(ii) is present. 

Furthermore, Parsons criticizes the deductive part of Woods’ 
system for being too unrestrictive. The deductive mechanisms 
fail to suitably restrict inference to sentences made true by the 
author’s say-so. This latter point may be the most severe criti- 
cism against Woods’ logic of Action. 

5 A Multi-context Logic of Fiction 

Let us write 

x:(f) (1) 

to mean that the statement </> is asserted in the Action x. For example: 

Hound-of-B : domicile {Sherlock Holmes, London) (2) 

We can explicate the meaning of terms used in Actions through a mapping 
into the language of the Action-teller. Thus, we can map the term ’’London” in 
Conan Doyle’s stories to ’’London” in reality as we know it, and establish the 
correct meaning of references to London in that way: 

^Hound-of-B (’’London”) = ’’London” (3) 

But what about Sherlock Holmes, who does not occur in reality? 

It is plausible to think that a reader of Action takes notice of each new 
character when they are flrst mentioned, and refers back to that notice upon 
subsequent mention. 



[Hnw7H,ttnii7qj 
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What is, then, the notice arising from a first encounter with the term ’’Sher- 
lock Holmes”? His name, and the fiction he occurs in, are all the information 
there is at that stage. Let us enrich the reader’s vocabulary with a fresh atom, 
and map the fictional detective to that: 

7kound-of-B (” Sherlock Holmes”) = ”Hound-of-B-fictional-Sherlock-Holmes” 

( 4 ) 

Keeping languages countable at all points of view, we shall avoid technical prob- 
lems here. 

When there are implicit equalities in the language of the fiction, the mapping 
M must respect them: 



If t = u then %;{t) = %(u) 



( 5 ) 



In this way, ” Sherlock” , ” Sherlock Holmes” , and ” Holmes” , are all interpreted 
the same. 

With this notation we can apply the results of [NSDbj . and adapt their multi- 
context algebraic systems to the logic of fiction. 

The cent ral tw o rules for reasoning in systems of fictions structured like the 
contexts of HMS02) , are the following: 



u : %{(!)) 

ux : (j) 



RuPx 



ux : (j) 
u : %{(!)) 



RdWx 



( 6 ) 



Here, ux is the fiction obtained by adding fictional component x to fiction 
u. The operation of adding a fictional component is fundamental to our system, 
and reality, fictions, contexts, points of view, are all lumped into one and the 
same category and are represented as sequences of fictional components strung 
together associatively. 

Mathematically, this amounts to an associative algebra on terms denoting 
fictions. Technically speaking, we are structuring the space of fictions as a semi- 
group. 

As observed earlier, addition of a fictional layer to itself is idempotent: 



uu = u (7) 

As shown in [lNosU2) . idempotence is within the scope of the theory of 
so we are now in a position to make direct use of the latter. It gives sound and 
complete deductive rules for a class of multi-context systems that the present 
one is a member of. 

For simplicity of presentation we restrict ourselves to the case in which the 
language of each fiction is propositional, and leave the first order case out for 
now. 

The main definitions are given below, but for proofs and other technical 
details we refer the reader to and ILNosU2l . 
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5.1 Languages and Mappings 

Let C be a countable set of contexts, or fictions, given a priori, and let C* 
denote the set of finite sequences of elements of C, without adjacent identical 
pairs. The condition corresponds to the idempotence of fiction composition, as 
discussed above. 

We use a, b, c, d, sometimes subscripted, to denote primitive fictions from C, 
while t, u, V, w, x, y, z and their subscripted variants are used liberally to denote 
primitive fictions from C, composite fictions from C*, or composite fictions from 
C\ 

For each fiction u G C* let there be a propositional language Lu, that is used 
to express sentences in this fiction. 

Definition 1 (Well formed formulae) Well formed formulae are defined as 
follows (for all u G C*). If f is a propositional formula in Lu, then for all y G C* 
y.(j) is a well formed formula, and 4> is called a y-formula. 



Definition 2 (Language mapping) For all c G C, there is a partial recursive 
function % that maps uc-formulae into u-formulae for arbitrary u G C* . 

Intuitively, a language mapping from uc to u states which part and how the 
content of the fiction uc is represented in the fiction u. 



5.2 Local Model Semantics 

Every equivalence class of fictions, i.e. each u G C* , has its own formula lan- 
guage Lu- The semantical structure we are about to define, takes as its basic 
building blocks the local interpretations of each language Lu- We can identify 
interpretations with subsets of i.e. the true formulas in each interpretation. 

The semantical structure for the entire system of languages reflects the way 
in which fictions are augmented by adding fictional components. We start by 
defining ground extensions of fiction terms: 

Definition 3 (^-continuation) Given x G C* , an x- continuation is a fiction 



XC1C2 ---Ch 



where 0 < h and Ci G C for 1 <i <h. When h= 0, this is just x. 



Definition 4 (x-chain) For x G C* , an x-chain m is a function which maps 
every x- continuation y to a set my of interpretations of Ly (the local models of 
fiction y), such that for some x- continuation y, my is not empty, and for all 
X- continuations y, and fictions c G C: 

1 m y.Tc{(j)) if and only if m\= yc-(j> 

2 the cardinality of my is at most 1. 
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Definition 5 (Satisfiability) An x-chain m satisfies a formula y.cj) where y is 
an X- continuation, in symbols m \= y.(f>, if for any s G my, s \= f according to 
the definition of satisfiability for propositional formulae. 



Definition 6 (Logical conseqnence) A formula x.cp is a logical consequence 
of a set of formulae F, in symbols F \= x.fi if, for any z-chain m, such that x is 
an z-continuation, if m \= {y.j G F\x y and y is a z - continuation} , then for 
all s G mx, s \= {iplz.ijj G F,x = z}, implies that s\= fi. 



5.3 Reasoning between Fictions 



The notion of ^-continuations induces a partial order among fictions, each fiction 
preceding its continuations. Let us see how one moves between composite fictions 
which are related in a partial order. We rely on a natural deduction calculus 
extended with indices as described in [rrrrmi and Enoi, plus the following 
inter-fictional deduction rules: 



■<P 


uc.fi 

— .T- / RdWc 


(8) 


UC.fj) <-->■ 

u.Tfifi) ^ 




(9) 


IDEMl 


IDEM2 

uu.(p 


(10) 



We say that a formula u.cj) is derivable from a set F of formulae, in symbols 
F h u.(j), if there is a deduction of u.fi from F, that uses u-rules (as defined in 
[Em]) and the above inter-fictional deduction rules. 



6 Soundness and Completeness 

We the following soundness and completeness result: 

Theorem 1 (Soundness and Completeness) F \= u.(p if and only if F V- 
6.1 Soundness 

Soundness of Rdwc and Rup^ are direct from item 1 of the chain conditions, and 
RRI is sound by virtue of item 2 of the chain conditions. Soundness of I DEMI 
and IDEM2 follows because any u-chain is also a uu-chain and vice versa. 



A Contextual Approach to the Logic of Fiction 



241 



6.2 Completeness 

The completeness proof relies on canonical models which respect the bridge 
rules. The basic building blocks will be maximal consistent sets of well-formed 
formulae. Let us state the versions of consistency and maximality that we need. 

Definition 7 (x-consistency) A finite set A of well-formed formulae is said 
to be x-consistent iff A\/ x.T, and an infinite set is x-consistent iff every finite 
subset is x-consistent. 



Definition 8 (x-maximality) A set A of well-formed formulae is said to be 
x-maximal iff A is x-consistent and for all well-formed labelled formulae y.6 
such that A U {y.^} is x-consistent, y.S G A. 

Theorem 2 (Lindenbaum) Any x-consistent set of wffs can be extended to 
an x-maximal set. 

Proof: see [IVS0?]i. 



Canonical model. Now let us choose an arbitrary x-consistent wff x.5 and 
construct an x-chain for it. To begin with, we expand {x.<f} to an x-maximal set 
A by the construction in the previous lemma. 

Definition 9 (Canonical model) For all x- continuations y = xc± . . .Ch 

— letAy = {X\x.%fi...%,(X))eA} 

— let Sy be the set of interpretations of the language Ly 

— let Ty = {s G Sy \ s \= Ay} bc the subset of interpretations that validate Ay 

— and let m be the function that maps y to % if Ty = % and to {t} otherwise, 
where t is some arbitrary member ofTy. 

Our canonical model is the x- chain m. 



Ay is well-defined, so m is really an x-chain. To see this, we prove that 



x.Zfi. . .%,(X)) G A iff x.%fi. . .T,fiX)) G A 



whenever 



xci . . .Cfi = xdi . . . dfc. 



In fact. 



x.%fi...%,{X))GA 



iff, by h applications of Rup, 



( 11 ) 



((xci) ...Ch).XGA 



iff, by associativity. 



iff, by 



xci . . .Cfi ■ X G A 
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xdi . . .dk ■ X G A 

iflF, by associativity, 

((xrfi) . . .dfe) : A e Z\ 

iff, by k applications of Rdw, 

x.Td,(...Trf,(A))e A 

As regards the model conditions, the m we have defined here trivially fulfills 
condition 2, and condition 1 is fulfilled because for c G C and an ^-continuation 
y = xci ■ ■ .Chi have 

y : 'Tc{\) G A iff yc : X G A 

by Rupi Rdw, and x-maximality of A. 

The x-chain m satisfies the wff x.S (take h = 0), so we have completeness. 

7 Conclusions 

Now, how does our system fare with respect to the intuitive criterions A-E of 
section Of Quite well, we maintain: 

A. Reference is possible to fictional beings even though they do not exist. 

Each fiction is endowed with its own local language, with unrestricted 
freedom of reference to local fictional entities. The interface with 
other fictions in general, and with the real world in particular, is 
through language mappings which explicate the meaning of imported 
references. 

B. Some sentences about fictional beings and events are true. 

Each local language has its own truth assignment. 

C. Some inferences about fictional beings and events are correct. 

Rules (0, (i, (Cni) tell us which ones are correct. 

D. These three facts are made possible, in a central way, by virtue of the creative 
authority of the authors of fiction. Indeed, the primary and basic criterion of 
truth for fictional sentences is the author’s say-so. 

Again, the local languages have full autonomy with respect to their 
assignment of truth and their consequence relation. There is noth- 
ing to prevent a fiction from having exactly the set of truths that 
correspond to an author’s say-so. 

E. It is possible in a fictional truth to make reference to real things. For example, 
’’Sherlock Holmes lived in London” is true and refers to the actual capital 
city of Britain. 

The meaning of references which migrate between fictions and the 
real world is explicated through the language mappings 7(.. In the 
example, the name ’’London” in the language of the detective story 
would, when imported into the real world, be mapped to the name 
’’London” in the language of the real world, and the reference to the 
capital of Britain would be secured. 
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We have presented a logic of fiction which does a straightforward applica- 
tion of the Multi-Context language and Local Models semantics of IGGOllSGOll. 
Em and measures well against the above benchmarks. However, much is still 
left for future study: for instance, a more thorough comparison with existing 
logics of fiction should be made, and the applicability of analogy-related work 
to this line of inquiry remains to be investigated. 
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Abstract. This work discriminates external and internal visual context 
according to a recently determined terminology in computer vision. It is 
conceptually based on psychological findings in human perception that 
stress the utility of visual context in object detection processes. The 
paper outlines a machine vision detection system that analyzes exter- 
nal context and thereby gains prospective information from rapid scene 
analysis in order to focus attention on promising object locations. A 
probabilistic framework is defined to predict the occurrence of object 
detection events in video in order to significantly reduce the computa- 
tional complexity involved in extensive object search. Internal context is 
processed using an innovative method to identify the object’s topology 
from local object features. The rationale behind this methodology is the 
development of a generic cognitive detection system that aims at more 
robust, rapid and accurate event detection from streaming video. Per- 
formance implications are analyzed with reference to the application of 
logo detection in sport broadcasts and provide evidence for the crucial 
improvements achieved from the usage of visual context information. 



1 Introduction 

In computer vision, we face the highly challenging object detection task to per- 
form recognition of relevant events in outdoor environments. Changing illumi- 
nation, different weather conditions, and noise in the imaging process are the 
most important issues that require a truly robust detection system. This paper 
considers exploitation of visual context information for the prediction of object 
location and identity, respectively, that would significantly improve the service 
of quality in real-time interpretation of image sequences. 

Research on video analysis has recently been focussing on object based in- 
terpretation, e.g., to refine semantic interpretation for the precise indexing and 
sparse representation of immense amounts of image data fCTTI . Object detec- 
tion in real-time, such as for video annotating and interactive television Q, im- 
poses increased challenges on resource management to maintain sufficient quality 
of service, and requires careful design of the system architecture. 

* This work is funded by the European Commission’s 1ST project DETECT under 
grant number IST-2001-32157. 
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Fig. 1. External and internal visual context as a means to trigger discrimination pro- 
cesses for the purpose of object detection in video streams. 



Fig. Dillustrates how external and internal visual context are used to detect 
object information in a video stream. Rapid extraction of the context of a scene 
- the global spatial context with respect to object detection - might trigger early 
determination of regions of interest (ROI) and support careful usage of resources 
for more complex discrimination processes (Section O). Within the ROI, ob- 
ject identification requires a grouping of local information [i^. In particular, the 
presented work describes how internal context from a configuration of object ap- 
pearances - the local spatial context with respect to object detection - is exploited 
to distinguish collections of local measurements by means of their geometrical 
relations (Section E|. Finally, a federation of discri minatory processes is con- 
trolled by a supervising decision making agent jS|??j to feed object information 
into a database for statistical performance evaluations. 

Recent work on real-time interpretation applies attentional mechanisms to 
coarsely analyze the external context from the complete video frame informa- 
tion in a first step, reject irrelevant hypotheses, and i terativ ely apply increasingly 
complex classifiers with appropriate level of detail [32f30 j. In addition, context 
priming makes sense out of globally defined environmental features to 

set priors on observable variables relevant for object detection. Investigations 
on the binding between scene recognition and object localization made in ex- 
perimental psychology have produced clear evidence that highly local features 
play an important role to facilitate detection from predictive schemes 
In particular, the visual system infers knowledge about stimuli occurring in cer- 
tain locations leading to expectancies regarding the most probable target in the 
different locations {location- specific target expectancies, uni). 

Extraction of internal object context often optimizes single stage mapping 
from local features to object hypotheses [Ei2n|. This requires either complex 
classifiers that suffer from the course of dimensionality and require prohibitive 
computing resources, or provides rapid simple classifiers with lack of specificity. 
Cascaded object detection has been proposed to decompose the mapping 
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into a set of classifiers that operate on a specific level of abstraction and focus 
on a restricted classification problem. We investigate the impact of local spatial 
context information on the performance of object detection processes, using a 
Bayesian method to extract context from object geometry. 

The methodology to eploit visual context is embedded within a global frame- 
work on integrated evaluation of object and scene specific context (see [[3], Sec- 
tion 0 with the rationale to develop a generic cognitive detection system that 
aims at more robust, rapid and accurate event detection from dynamic vision. 

2 Visual Context in Object Detection Processes 

In general, we understand context to be described in terms of information that 
is necessary to be observed and that can be used to characterize situation 0. 
We refer to the ontology and the formalization that has been recently defined 
with reference to perceptual processes for the recognition of activity [0, and a 
Bayesian framework on context statistics with particular reference to video 
based object detection processes. 

In a probabilistic framework, object detection requires the evaluation of 

p(V3,CT,x,o*|y), (1) 

i.e., the probability density function of object Oi, at spatial location x, with pose 
and size a given image measurements y. A common methodology is to search 
the complete video frame for object specific information. In cascaded object 
detection, search for simple features allows to give an initial partitioning into 
object relevant regions of interest (ROIs) and a background region. 

The visual context is composed of a model of the external context of the 
embedding environment, plus a model of the object’s internal context, i.e., the 
object’s topology characterized by geometric structure and associated local visual 
events (e.g., local appearances) PH so that local information becomes charac- 
terized with respect to the object’s model (e.g.. Fig. EJ. Measurements y are 
separated into local object features representing object information y^ and the 
corresponding local visual environment represented by context features ye- As- 
suming that - given the presence of an object Oi at location x - features yr and 
Ye are independent, we follow PH to decompose Eq.0into 

p(y|<p,CT,x,Oi) = p{YL\g>,a,x,Oi) ■ p{yE\ip,(r,x,Oi). (2) 

Cascaded object detection leads to an architecture that processes from simple to 
complex visual information, and derives from global to local object hypotheses 
(e.g., [EE2]). Reasoning processes and learning might be involved to select the 
most appropriate information according to an objective function and learn to 
integrate complex relationships into simple mappings. They are characterized by 
tasks, goals, states defined with respect to a model of the process, and actions 
that enable transitions between states )E|, much in the sense of a decision mak- 
ing agent controlling discriminatory processes to improve quality of service in 
object detection (Fig.0). 
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RCCOCMITtON Of VISUAL CONTEXT ATTENTION MOOULE 




OBJECT DETECTION 



Fig. 2. Concept for recognition of external context for attention. Landmarks are ex- 
tracted from the scene and matched towards a simlpe scene model. Bayesian recognition 
enables evidence integration over time and space. Attentive predictions on the location 
of embedded objects finally instantiate a complex object classifier (PJ, SectionO) that 
verifies or rejects the object hypotheses. 



3 External Context for ROI Detection 

The concept is to propose attention from scene context using knowledge about 
forthcoming detection events that has been built up in repeated processing on 
the scene before. The knowledge which is derived from a simple scene model is 
activated from rapid feature extractions (e.g., using color regions) in order to 
operate only in those image regions where object detection events will most likely 
occur. The localization within an already modeled video scene is on the basis of 
a Bayesian prediction scheme. Recent investigations on human visual cognition 
give evidence for memory in visual search underlining the assumption that 
already simple modeling mechanisms significantly support the quality of service 
in object detection. 

3.1 Scene Representations from Landmarks 

The basis for landmark based localization within a video scene is the extraction 
of discriminative and robustly re-locatable chunks of visual information in the 
scene. Landmarks have been efficiently defined on local greyvalue invariants 
color and edge features based on local appearance and distinguished 
regions m- 

We apply an approach that rapidly extracts color and shape features but 
also considers the contrast of the extracted landmark region with respect to 
the corresponding features of its local neighborhood, being motivated by human 
perception, where, e.g., color is addressed by attentional mechanisms in terms of 
its diagnostic function M- Note that any other choice of local landmark repre- 
sentation would enable to pursue the methodology described in Sections fT^ . fT^ 
as well. 






Predictive Visual Context in Object Detection 



249 



/O 0 1 1 l\ 

0 0 111 
0 0 111 
0 1110 
0 1110 
\1 1 0 0 0 / 

(d) 

n u rj LI Ml n r 

Fig. 3. Characteristic landmark features. (a,b,c) Color based ROIs for landmark defi- 
nition, denoting the ROI border and the variance ellipsoid of the spatial distribution 
of ROI member pixels. (c,d) Class based extraction of shape: (c) Sampling (crosses) 
within the landmark region, (d) binary pattern received from color class based interpre- 
tation of the pixels sampled in (c), and attributed to class 4 in (e). (e) All prototypical 
patterns of shape to classify (d). 





Fig. 4. Triple configuration of landmarks in a sample video frame using the landmark 
extraction described in Section O 



In order to increase the discriniinability of the locally extracted context, it 
is useful to combine landmarks into geometric configurations of 1-, 2-, and 3- 
tuples of landmarks (Fig. E|. Tuples of localized image properties own specific 
characteristics of scale invariance, ordering and topology that make them 
attractive for landmark usage. Each single landmark region is encoded by a 
vector A with landmark specific components zzj = (c, n,s,...), with features 
being vector-coded by color (c), contrast (n), and shape (s). A 3-tuple landmark 
configuration denotes A = [i>i,U 2 , 1's, ck]^, where a encodes the angles between 
landmarks Vi. 
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3.2 Bayesian Scene Recognition 

The goal of rapid scene modelling is to provide a simple and efficient encoding 
of the environment. The presented work is based on localization of a given land- 
mark within a complete video sequence. The extracted landmark results in a 
hypothesis on representing a sample of a physical identity li of a landmark, i.e., 
the real landmark that generates a distribution of features from appearances to 
the observer. In our model, we pursue a framework of recognition and attribute 
each landmark sample A to a physical landmark identity li and associated se- 
mantic blocks (frames) fj in the reference (training) video sequence. 

A simple scene model is rapidly generated from the frames of a video train- 
ing sequence in terms of a list of landmark vectors k G A that can be matched 
against a currently extracted landmark sample A*. Scene recognition from inter- 
pretation of a landmark /* is then computed via /* = argmin;^ ||At — A(/i)||, which 
represents a nearest-neighbor matching to stored landmarks \{li) in ’A-space’. 

In order to represent the uncertainty in landmark classification, the land- 
mark li specific sample distribution is modelled using an unimodal Gaussian, 
S ^ )• The posterior interpretation of a landmark configuration A is 
then outlined as follows. 



_ pmpjh) _ pimj:UPih\fj)p{fj) 

p{\) p{X) 



where A denotes a sample landmark extraction from a test image, P{li\X) is 
the posterior with respect to a corresponding physical identity of a landmark, 
P{li\fj) is the probability for observing a physical landmark given a specific 
frame of the video sequence. To be precise, we require fj to partition the space 
of landmarks h, which is the case in video block segmentation. 



3.3 Contextual Cueing to Predict Object Detection Events 

Assuming that the scene has been repeatedly viewed and in a prevalent direction, 
each landmark configuration can be associated with a pointer to a succeeding 
object event that has been extracted before using any highly accurate, compu- 
tationally expensive object identification method In the scene model, a 

directional information in terms of an angle interval (/?±ct), is provided in which 
the object event is completely embedded; (5 is in the direction of the center of 
the predicted detection event, and ±ct designates an angle interval so that the 
detection event is completely embedded within. This interval ±a defines the 
standard deviation with respect to a one-dimensional normal distribution, i.e., 
N{iip,a), that is defined geometrically normal to the straight line originating 
in landmark li with angle (3. In total, these operations will define a probability 
density function (PDF) on the image, p(x|f7, ^^), with image locations x carry- 
ing confidence information about the support for a local object detection event, 
out of the set of objects, i.e., Q, and in terms of a landmark specific confidence 
map (Fig. However, in real-time implementations, Monte-Carlo sampling P 
would be appropriate to approximate the estimated PDFs. 
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To increase the robustness of the approach, we integrate the confidences from 
those landmarks Ik £ K that have been consecutively visited in an observation 
sequence and been selected as estimators for the forthcoming object location, 
e.g., simply using a naive Bayes estimator, 

K 

p(x(/3)|J7, = p{x{a)\n, k), (4) 

k=l 

and thereby receive an incremental fusion of individual confidence maps. Fusion 
might use all those predictions N{) that correspond to the selected k giving 
P{li\X) fSection H.2I . weighting individual contributions according to the confi- 
dences given in Eq. 0(Fig. 0. 

4 Internal Context from Probabilistic Structural 
Matching 

Recently, the requirement to formulate object representations on the basis of 
local information has been broadly recognized HECHl. Crucial benefits from de- 
composing the recognition of an object from global into local information are, 
increased tolerance to partial occlusion, improved accuracy of recognition (since 
only relevant - i.e., most discriminative - information is queried for classification) 
and genericity of local feature extraction that may index into high level object 
abstractions. In this paper we are using simple brightness information to define 
local appearances, but the proposed approach is general enough to allow any 
intermediate, locally generated information to be used as well, such as Gaussian 
filter banks |0], etc. 

Context information can be interpreted from the relation between local ob- 
ject features pS] or within the temporal evolution of an object’s appearance P, 
Ep. Decomposing the complete object information into local features transforms 
p{oi\yL) into p{oi\yLi, ...,yLjv)i ^ determines the size of the object specific en- 
vironment. The grouping of conditionally observable variables to an entity of 
semantic content, i.e., a visual object, is an essential perceptual pro cess P| . 

The relevance of structural dependencies in object localization has 

been stressed before, though the existing methodologies merely reflect co- 
location in the existence of local features. The presented work outlines full evalu- 
ation of geometrical relations in a framework of probabilistic structural matching 
using Bayesian conditional analysis of local appearances as follows. 

Geometrical information is derived from the relation between the stored ob- 
ject model - the trajectory in feature space - and the actions (shift of the focus 
of attention) that are mapped to changes in the model parametrization (e.g., 
change in viewpoint, i.e., Figure 0 illustrates the described concept in 

the reference frame of the local appearance based object model. The geometry 
between local appearances is now explicitly represented by the shift actions Oi 
(deterministically causing Z\<Pj) that feed directly into Bayesian fusion ^23 by 

P{0i,Pj\yi,ai,y2) = aP{o„ipj\yi,ai)p{y2\oi,ipj,yi,ai). (5) 
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Fig. 5. Spatial context from the geometry of local information. A single appearance 
might give rise to evidence for multiple objects (at crossings of manifolds), even from 
a second measurement. The shift action to change a visual parameter of the manifold 
(e.g., a viewpoint change) and the associated appearances is then matched towards the 
manifold’s trajectory in feature space to discriminate between object hypotheses. 



Spatial context from probabilistic sturctural matching is now exploited using 
the conditional term P{oi, (Xi): The probability for observing view (o^, (pj) 

as a consequence of deterministic action ai = must be identical to the 
probability of having measured at the action’s starting point before, i.e. at view 
{oi,pj — Api), thus P{oi,ipj\yi,a{) = P{oi,tpj — Aipi\y{). Note that this obvi- 
ously does not represent a naive Bayes classifier since it explicitly represents the 
dependency between the observable variables yi,ai. 

Furthermore, the probability density of y2, given the knowledge of view 
is conditionally independent on previous observations and actions, and 
therefore p(y 2 I Oi,(/jj,yi,ai) = p{y2\oi,Pj)- The recursive update rule for condi- 
tionally dependent observations accordingly becomes, 

P{oi,Pj\yi,ai, . . . ,Oiv-i,yiv) = ap{yN\oi,pj)P{oi,ipj - ApN-i\yi,ai, . . . ,yjv-i) 

( 6 ) 

and the posterior, using = {yi, oi, . . . , aAr-i,yAr}, is then given by 

P(o.ra = ^P(o.,^,TO. (7) 

3 

The experimental results in Figures 1 ^ and 1 ^ demonstrate that context is 
crucial for rapid discrimination from local object information. The presented 
methodology assumes knowledge about (i) the scale of actions and of (ii) the 
directions with reference to the orientation of the logo, which can be gained by 
ROI analysis beforehand. 
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Fig. 6. Recursive contextual cueing and spatial attention for object detection, (a) Orig- 
inal frame with extracted landmark configurations. (b,d) Confidence maps derived from 
2 individual landmarks, (c) Confidence map after Bayesian integration, depicting con- 
fidence beyond the threshold of 0 = 0.9. (e) Accurate and (f) inaccurate predictive 
search regions, (g) integrated conhdence map contains target. Since the target is rep- 
resented in the fused confidence map, landmark 1 and 2 would impose object hits. In 
contrast, single landmark evaluation of 1 and 2 would produce one erroneous result. 



5 Experimental Results 

The object detection experiments were performed on ’Formula One’ sport broad- 
cast image sequences. The proposed object detection system first applies ROI 
detection based on contextual cueing from landmark configurations, supported 
by some color specific pattern classifiers |24|21 j . Within these detection regions, it 
extracts internal context from local features. The following paragraphs describe 
the recognition performance from (i) external context and (ii) using probabilistic 
structural matching (internal context). 

(i) Context from landmark configurations. The experiments were con- 
ducted on prediction of object detection in ’Formula One’ broadcast videos. In 
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Table 1. Performance analysis of contextual cueing for attentive detection of objects of 
interest. Using extensive image analysis, ca. 73,1 % of the image had to be analyzed in 
order to detect a logo. In contrast, using CCA (contextual cueing for attention) analysis, 
only 46, 1% of the complete image had to be processed, resulting in a gain of 36,7 % of 
unprocessed image parts. CCA does not only provide impressive gains in speedup, but 
also in the statistically estimated accuracy of object detection as illustrated in Fig. db. 



Extensive analysis % 


CCA analysis % 


unprocessed image part (CCA) % 


73,1 


46,1 


36,7 



particular, a video sequence of 71 frames (of 795 x 596 pixels) was used as train- 
ing sequence and analysed to setup the scene model of the complete sequence, 
i.e., the interpretation of the landmark information, configurations, and the as- 
sociated indexing and probabilistic interpretation for Bayesian scene recognition 
(Section 

The ROI color information was clustered into 12 Gaussian unimodal kernels 
via expectation maximization (EM) Shape patterns were clustered into 12 
classes alike. The interpretation of this sequence resulted in 4351 n-tuple land- 
mark registrations from 2123 physical landmark identities. The attribution to 
detection events was performed manually and under the assumption that this 
particular scene is captured by a specific camera motion (left to right) so that 
events are always encountered from one direction. 

Via the localization of landmarks one can predict the successive detection 
event. The error in degree per single prediction is on average 2, 6°, ±6, 39° stdev). 
A direct hit rate of 93,7% is achieved within the 2 x interval (Fig.Qa). The 
resulting ROC curve (Fig. O') interprets the contextual cueing method in terms 
of a detection classifier, leading to excellent results with respect to its object 
detection performance. Finally, table E illustrates the gain in resources due to 
contextual cueing. 



(ii) Context from geometry. Spatial context from geometry can be easily 
extracted based on a predetermined estimate on scale and orientation of the 
object of interest. This is computed (i) from the topology of the ROI, and from 
(ii) estimates on (</? | ct, x, , y^; ) and Ps (ct | x, , y £; ) from global image features 

[EJ. We present a recognition experiment from spatial context on 3 selected 
logos (Figure EJa)) with local appearance representation as described above, 
and a 3-dimensional eigenspace representation to model highly ambiguous visual 
information. Figure El(right) demonstrates the dramatic decrease of uncertainty 
in the pose information for object 02, i.e., p(^,ipj\y), from several steps of 
information fusion according to Eq. El Figure 0(b) illustrates the original and 
final distribution for all objects, Oi — 03. 
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Fig. 7. Performance evaluation of the contextual cueing system, (a) Tolerated error 
and associated percentage of predictions (%) within this interval (for n-tuples of land- 
marks: point=l-tuple, dotdashed=2-tuple, dashed=3-tuple landmarks, line=avg.). (b) 
Receiver operator characteristic (ROC) curve demonstrating the high capabilities for 
object detection understanding the contextual cueing in terms of a detection system. 



6 Conclusions 

Context information contributes in several aspects to robust object detection 
from video. This work presents a predictive framework to focus attention on 
detection events instead of extensively searching the complete video frame for 
objects of interest. 

Firstly, the probabilistic recognition of scenes from a landmark based de- 
scription of the scene context are the innovative components that enable both 
rapid, predictable, and robust determination of relevant search regions. Secondly, 
grouping of local features can be rapidly applied and yields improved results. Ad- 
ditional computing derives the context from the geometry of local features which 
has been demonstrated to dramatically improve object recognition. Further ex- 
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Fig. 8. (a) The logo object set and associated pattern test sequence, (b) Probabil- 
ity distribution on pose hypotheses w.r.t. all 3 logo objects from a single imagette 
interpretation (top) and after the 5th fusion of local evidences (bottom). 







Fig. 9. Left: Probability distributions over pose hypotheses (imagette pose no. 1-131 
within logo) from individual test patterns no. 1-6, from top to bottom. Right: Cor- 
responding fusion results using spatal context from geometry illustrating fusion steps 
no. 1-5. 



periments on contextual cueing demonstrate that prediction of object events 
from landmark based scene context can decisively determine an efficient focus 
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of attention that would permit to save a substantial amount of computational 
resources from extensive processing. 

Future work will focus on the extraction of local context from scene informa- 
tion in order to predict the future locations of detection events. We will consider 
the temporal context in the occurrence of landmark configurations and therefore 
most probably improve the landmark based scene recognition, together with the 
prediction performance as well. 
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Abstract. Stalnaker ([9]) and much subsequent work argues that we should 
model the common ground as a set of possible worlds, but there are different 
ways in which one might imagine doing so. This paper asks which way is most 
appropriate when it comes to explaining how language works. The paper looks 
at a certain restriction on the use of copular questions, and suggests that ac- 
counting for this restriction is easier given one way of modeling the common 
ground than given others. 



1 A Question about Language 

This paper will revolve around a particular fact. To get a handle on this fact, consider 
the question in (1). 

(1) Who do you think is John? 

There are certain things that we cannot use this question to ask. In particular, we 
cannot use it to ask what function John performs. To see what I mean, imagine Sce- 
nario I. In this kind of scenario, I couldn’t use the question in (1) to ask what instru- 
ment John plays in the trio. I couldn’t use it, that is, to ask for information of the kind 
we would express with a sentence like (2). It is interesting to note that (I) contrasts in 
this respect with the sentence in (I’), which I can use to ask what instrument John 
plays in the trio' - so in some way this restriction on the use of (1) relates to the fact 
that who originates in subject position, and not in object position as in (!’). 



' For reasons not obvious to me, the corresponding which-question — Which (one) do you think 
John is ? — sounds awkward as a request for this kind of information. The awkwardness 
goes away when the “range” of which one is linguistically determined, as in Which (one) do 
you think John is, the cellist or the violinist? Incidentally, I suspect that one can make an 
argument similar to the one I make in this paper by considering sentences of the latter kind: 
relevant would be the contrast on Scenario II between Which (one) do you think is John, the 
guy on the left or the guy on the right? (good) and Which (one) do you think is John, the vio- 
linist or the cellist? (odd, in my judgment odder than Which one do you think John is,...). 
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Scenario I . When we arrive at the piano trio concert, a friend of ours tells us that he is going 
to introduce us to a couple of the musicians. He brings over two men in tuxedos and intro- 
duces one as John (the one on the left), and the other (the one on the right) as Bill. At this 
point, we know that John is one of the musicians, but we don’t know whether he is the cel- 
list, violinist or pianist. Likewise for Bill. 

(l’)Who do you think John is? 

(2) John is the violinist. 

What can we use the question in (1) to ask? To see an example, consider a slight 
variation on our first scenario, which I have labelled Scenario II. On this scenario, 
while again it would be odd to use the question in (1) to ask what instrument John 
plays, I could use it to ask for information of the kind we would express with an an- 
swer like (3).^ On Scenario I, this kind of question is infelicitous - the answer is al- 
ready common knowledge - but on Scenario II, where the answer is not common 
knowledge, it is clear that the question can have this use. 

Scenario II : We weren’t paying attention, and our backs were turned when the introductions 
were made. When we turn around, we see those two people in front of us, but we don’t 
know which one got introduced to us as John and which one got introduced to us as Bill. 

(3) John is the guy on the left. 

With this in mind, here is the fact that the paper will revolve around. If we take a 
copular question of the form Who is John? where who originates in subject position 
((4)), we cannot use it to ask what function John performs, but we can use it to ask 
which of the people in front of us got introduced as John. 

(4) WhO| iSj [,p 1 1. John ] 

(structure before wh-movement and auxiliary raising: [jp who is John ] ) 

To point you towards this fact, I had to ask you to think about the question Who do 
you think is John? rather than Who is John? This is because the main auxiliary moves 
above the subject in English questions, and so the question Who is John? is actually 
ambiguous between two different structures, one in which who originates in subject 
position and one in which who originates in object position. But the question Who do 
you think is John? is not ambiguous in this way, there it is clear that who originates in 
subject position, and so there we can see our fact: structures where who originates in 
subject position - structures like (4) - cannot be used to ask what function John per- 
forms. (To convince yourself of the same thing in another way, you could think about 
the contribution that the embedded interrogatives make to the sentences in (5). These 
too are cases where who clearly originates in subject position, since the embedded 
auxiliary doesn’t move above the embedded subject.) 

(5) a. Guess [who is John] (cf. a’. Guess [who John is ] ) 

b. May I ask [who is John] (cf. b’ . May I ask [who John is ] ) 



^ Or alternatively the answer The guy on the LEFT is John. I think there is some variation as 
to which form of the answer is preferred. 
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One concern of this paper will be to ask: What is the right way to describe this re- 
striction on the use of (4)? And why is the use of (4) restricted in this way? 

2 A Question about Information States 

That was a question about language: why can a structure of the kind in (4) convey 
some things but not others? Now here is a different kind of problem. I want to con- 
sider our first problem in the context of this second one. 

Suppose we assume that the right way to model information states, and in particular 
common grounds, is as a set of possible worlds. That is, suppose we imagine a la 
Stalnaker that there is a “context set” that consists of the open candidates for the ac- 
tual world as far as the participants in the conversation are concerned. Suppose we 
also assume (pace Lewis) that individuals exist across worlds. Now, with these as- 
sumptions in mind, recall our second scenario, the one in which our backs were turned 
at the point when introductions were made. The problem is: how should we model the 
common ground that the scenario leads to? At the point when I ask you the question 
in (1), what kinds of worlds are in our context set? 

(The more general problem that this is getting at, of course, is: how should we 
model the common ground in cases where we would say that it is not yet part of our 
presumed common knowledge who is John.) 

Here are two ideas that we might entertain. 

2.1 Idea A 

On Idea A, there are two kinds of worlds in our “context set.” In some, on the left is a 
certain individual j who exists throughout our possible worlds, and on the right is a 
certain individual b who exists throughout our possible worlds. In others, it’s the other 
way around. The information that we lack by virtue of having had our backs turned 
when introductions were made is the information that would allow us to exclude one 
of these bunches of worlds. 




wl w2 



The information that we lack by having our backs turned is the information that 
someone could supply us with by telling us the sentence in (7). On Idea A, if we ac- 
cept the sentence (7) as true, this is how we adjust our context set: we keep those 
worlds in which we find j on the left (the wl worlds), but throw out those worlds in 
which j is not on the left (the w2 worlds). That is, on Idea A, to talk about John is to 
talk about j and to talk about Bill is to talk about b. 



(7) The guy on the left is John. 
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2.2 IdeaB 

On Idea B, again there are two kinds of worlds in our “context set,” and the informa- 
tion that we lack by virtue of having our backs turned - the information that (7) com- 
municates - is the information that would allow us to exclude one of these kinds of 
worlds. But the distinction between the two groups of worlds is of a different nature. 

On Idea B, unlike on Idea A, the same individual - call him x - is on the left in all 
worlds in the context set, and the same individual - call him y - is on the right in all 
worlds in the context set. Where the worlds differ has to do with the properties that x 
and y have. In some worlds, x has a property that for convenience we might call the 
“John” property, and y has a property that for convenience we might call the “Bill” 
property. In others, it’s the other way around. 

O wl: X has the “John” property, y has the “Bill” property 
w2: X has the “Bill” property, y has the “John” property 

w2 

On Idea B, how do we adjust our context set when we receive the information that a 
sentence like (7) expresses? We keep those worlds in which the guy on the left has the 
“John” property - that is, those worlds in which x has the “John” property, the wl 
worlds - and we throw out those worlds in which the guy on the left does not have the 
“John” property - the w2 worlds. So basically, on Idea B, to talk about John is to talk 
about the property we called the “John” property and to talk about Bill is to talk about 
the property we called the “Bill” property. These are properties that an individual may 
have in some worlds but not others. ^ 

What I will argue now is that the idea we adopt about how to model the common 
ground can determine how we look at the problem we started out with - the problem 
of why a question of the form in (4) can convey some things but not others. Interest- 
ingly, one idea about how to model the common ground, because it encourages us to 
look at the problem in a particular way, could make a direction for solution more 
evident than would another idea. In that case, we would have a reason to favor one 
idea over the other. I will suggest that this is indeed the case. 

(Incidentally: In the Appendix I mention a third idea about how to model the common 
ground that Scenario II leads to. I don’t discuss it in the text because it departs from 
an assumption that I don’t want to depart from here - the assumption that different 




^ But what kinds of properties are they? One way of taking the mentality behind Idea B is that 
these properties are the properties of occupying a particular position in the repertoire that we 
keep of individuals important to us. I will leave the possibilities open. One should certainly 
ask under what kinds of assumptions about the nature of worlds, individuals and properties. 
Idea B can be made sense of. 
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information states are to be modeled as different sets of worlds. I include it in the 
paper because it might be more intuitive than the two ideas 1 have set out here, and 
because I think the discussion in the text has some bearing on this idea too.) 

3 Tackling These Questions 

Let us now return to our initial fact: in a situation in which we have been introduced to 
someone named “John,” (4) can ask which of the salient people is the one so intro- 
duced, but not which function that person performs. One aspect of this fact happens to 
be that, in the scenarios we considered, (4) does not request information of the kind 
(9a) would contribute, but does request information of the kind (9b) would contribute. 
Let us ask how the different ideas of how to model the common ground lead us to 
think about this fact. Importantly, the different ideas of how to model the common 
ground will mean different ways of characterizing the kind of information that (4) can 
and cannot request. 

(9) a. # John is the violinist. b. John is the guy on the left. 

I will conduct this discussion from the standpoint of a few assumptions about how 
questions work. The background assumption is about declarative sentences: I assume 
that, in general, declarative sentences"^ express propositions - functions from worlds to 
truth values - and that, when we accept a sentence as true, we eliminate from the 
context set those worlds for which the proposition does not yield 1. Questions, I 
assume, are instructions to express one proposition in a certain set that the question^ 
makes relevant; a felicitous answer will then be a sentence whose denotation is con- 
textually equivalent to one of these propositions'’. (Propositions (|) and \|/ are contextu- 
ally equivalent with respect to a context set C when, for every world w in C, (|) is true 
in w iff \|/ is true in w.) Bearing this in mind, what do Ideas A and B suggest as far as 
the set of propositions that the question instructs us to choose among? 

Let us start with Idea B, the idea under which there is a “John” property that poten- 
tially holds of different people in different worlds. On Idea B, we can entertain the 
hypothesis that the set (4) instructs us to choose among is a set of propositions that 
vary with respect to an individual - a subset of (10). 

(10) { ^w. n = the person who has the “John” property in w | n an individual } 

some propositions in this set: 

Xw. X = the person who has the “John” property in w 

Xw. y = the person who has the “John” property in w 

This hypothesis enables us to explain as follows why the question does not elicit re- 
sponses like (9a) in the scenarios we considered. The idea would be that (9a) plausi- 



More precisely, the LF of a declarative sentence. Sentences that have different LFs are po- 
tentially ambiguous between different propositions. 

^ More precisely, the LF of a question. Questions that have different LFs are potentially am- 
biguous between different kinds of instructions. (See Section 4.) 

* More precisely, a sentence whose LF has a denotation that is contextually equivalent to one 
of these propositions. 
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bly expresses the proposition in (11). Since on a scenario like ours it is not known 
which of the individuals in front of us plays the violin, (11) will not convey what any 
of the propositions in (10) conveys, and so (9a) will not be a felicitous answer. In a 
similar way, this hypothesis will enable us to explain why the question does elicit 
responses like (9b). Idea B suggests an analysis of (9b) along the lines of (12) (this 
should be clear from our exposition of Idea B). Given that in all of the worlds in our 
context set the guy on the left is the same individual x, (12) is contextually equivalent 
to one of the propositions in (10): the proposition 'Kxn. x = the person who has the 
“John” property in w. 

(11) 'ksN. the person who has the “John” property in w = the violinist in w 

(12) 'ksN. the person who has the “John” property in w = the guy on the left in w’ 

Now suppose instead we adopt Idea A. Unlike on Idea B, it doesn’t look as though 

we are going to get anywhere by saying that (4) instructs us to choose from among a 
set of propositions that vary with respect to an individual. It seems more promising to 
say that the set is made up of propositions that vary with respect to an individual con- 
cept - a subset of (13). But there is an important difference between the approach that 
Idea B suggests and the approach that Idea A suggests. On Idea A, the subset must be 
restricted in such a way as to exclude concepts like the violinist (including concepts 
like the guy on the left). By contrast, on Idea B, where we dealt with a set of proposi- 
tions that varied with respect to an individual, no individuals needed to be excluded. 

(13) { Xw. F(w) = j I Fa function from worlds to individuals } 

some propositions in this set (abbreviated): 

Xw. the person on the left in w = j 
^w. the person on the right in w = j 

(14) Xw. j = the violinist in w 

To see that the subset must be restricted, note that, on Idea A, the natural way of 
analyzing our (9a) is as expressing the proposition in (14). If the subset were not 
restricted, it would include (14), and so we would wrongly predict (9a) to be a possible 
answer on a scenario like Scenario 1 or Scenario 2. 

To summarize, when we ask what our ideas of how to model the common ground 
suggest about what the set of propositions is that (4) asks us to choose among, this is 
what we find: in a certain sense, the proposition set that Idea A points to is more con- 
strained than the proposition set that Idea B points to. This suggests a moral. The 
tool for deciding between Idea A and Idea B should be the theory that we adopt of 
what propositions a question makes relevant. The deciding factor will be how easy it 
is to account for the different constraints on proposition sets that the different views 
force us to. If it is easy for our theory to account for the constraints on proposition 
sets that we are forced to once we adopt one of these views, that is a point in favor of 
the view; if hard, that is a point against. 



’ This is shorthand: to “be on the left” in w is roughly to be standing on the left of where we 
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Right now I see that there is a direction to pursue in order to account for Idea B’s 
less constrained proposition sets - that is what I will outline next. I don’t see any 
direction when it comes to Idea A’s more constrained proposition sets. So it seems 
to me that the balance tips in favor of Idea B. 

4 A Syntactic Puzzle 

Our initial question was: why does (4) mean what it does? On Idea B, this question 
becomes: why does (4) makes relevant propositions that vary with respect to an indi- 
vidual? I will now pull another item out from our toolbox: a theory that links the 
propositions a question makes relevant to what the question’s syntactic structure is. I 
will show that, if we take for granted a Heim and Kratzer ([6])-style theory of how 
syntactic structures are interpreted, then by assuming Idea B we can see our initial 
question as a syntactic puzzle. The puzzle is that there seem to be limitations on what 
syntactic structures we can generate for a question pronounced Who is John where 
who originates in subject position. The fact that we can transform our question into a 
syntactic puzzle means that we potentially open up areas for investigation. 

Here is the theory I assume in what follows. The basic idea is that a question’s LF 
determines what proposition set it makes relevant. Questions that are syntactically 
ambiguous in the sense that they admit different LFs are therefore also ambiguous in 
the sense that the different structures make different proposition sets relevant. How 
does the LF of a question determine a proposition set? LFs are interpreted along the 
lines given in Heim and Kratzer. In the case of questions, I assume, interpretation 
works in such a way that the denotation of a question’ s LF is a function from worlds to 
sets of propositions. At the same time, it is only licit to use a question when it yields 
the same proposition set for every world in the context seF — so that is how the single 
set of propositions arises. 

Given this background, imagine that we also assume Idea B. We can then explain 
why (4) means what it does if we can guarantee that (4) only admits LFs with a deno- 
tation like (15). (15) will lead to propositions that vary with respect to an individual. 

(15) A.U. { A.W. n = the person who has the “John” property in w | n is a person in u } 

With this in mind, here is the reasoning behind the claim that, if we assume Idea B, 
our initial question becomes a syntactic puzzle, (i) There is one LF (call it LFl) that it 



This is reminiscent of a principle that Stalnaker ([9]) argues for. A slightly distorted version 
of Stalnaker’s view is: declarative sentences yield not propositions, but functions from worlds 
to propositions; a condition on use limits us to those sentences that take every world in the 
context set to the same proposition. As it happens, Stalnaker also argues that in cases where 
this condition is not met, speakers can make use of a recovery procedure - “diagonalization” - 
- that, out of the function that the sentence denotes, determines a single proposition. The 
idea is that, given the function p, the procedure yields Xw. p(w)(w). Maybe, in cases where 
the condition of use for questions is not met, there is a parallel recovery procedure that, out of 
the function that the question denotes, yields a single set of propositions. Given the function 
Q, say, the procedure might first choose a set of (declarative-like) functions, Q’, such that Q 
= A.U. { p(u) I p E Q’ }, and then on this basis produce the set of propositions { Xw. p(w)(w) 

I pE Q’ ). However, I will ignore this possibility in what follows. 
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is plausible that we can generate for (4) and that has precisely the denotation in (15). 
(ii) If the LFs that we could build for (4) were exactly parallel to the LFs that we can 
build for some other questions, we might also expect to be able to generate a second 
LF (call it LF2), which does not have the denotation in (15). (iii) So to explain why 
(4) has the meaning it has, we have to explain why the syntax cannot generate LF2. 
In seeking to explain why the syntax cannot generate LF2, there are naturally different 
avenues to investigate, since the architecture of LF2 is not identical to the architecture 
of LFl. 

In the ensuing subsections, I will flesh out some details of these steps of reasoning. 
Space limitations force me to be brief, though. 

4.1 Step i: An LF with the Right Denotation That We Can Plausibly Generate 
for (4) 

The simplified structure in (16) gives the general idea of one LF that we can plausibly 
generate for (4). (I assume here that the verb has lowered back into IP, and that who 
actually has a complex structure. Note too that the structure contains silent items that 
function as variables over possible worlds.) It would not be controversial to posit 
such an LF, and it gives us just the denotation in (15).^ 

(16) CP <s, <st,t» 




4.2 Step ii: Why We Might also Expect a Second LF with a Different Denotation 

The reason why we cannot stop here and say that we have solved our problem is that it 
isn’t obvious that an LF like (16) is the only LF the question has, and other potential 



^ To get (16) to yield the denotation in (15), we need these denotations for its parts: [[wh]]® = 
{ P(n) I F(n) = 1 ) ; [[person]]® = n is a person in w ; [[ ...t^ is 

John ...] ]]® = A,w^. g(2) = the person who has the “John” property in w. Note that wh takes 
a predicate and a property (like the property of “being John”), and creates a set of proposi- 
tions each of which says that the property holds of some individual the predicate character- 



izes. 
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LFs might not lead to a denotation of the kind we want. What other potential LFs do 
I have in mind? Some background is in order. 

Basically since Engdahl ([2]), it has been clear that, once we say that questions like 

(17) have one LF that makes relevant propositions that vary with respect to an indi- 
vidual ((18)), then we must say they also have a second LF, which makes relevant 
different kinds of propositions. We must take this step because the questions license 
answers such as (19) even when such answers are not contextually equivalent to any of 
the propositions in (18) — for example, when it isn’t known who the candidates are. 
What kind of propositions does the second LF make relevant? A possible answer is: 
propositions that differ with respect to an individual concept ((20)). But now the 
worry should be clear: if (17) admits a second LF that makes relevant propositions 
that vary with respect to an individual concept, then (4) potentially might as well. 
And in that case, we might lose our account of the restriction on (4)’s interpretation. 

(17) Who will win the next election? 

(18) { ^w. in w, individual n wins the next election | n an individual } 

(19) The candidate with the biggest campaign budget [will be the winner]. 

(20) { A.W. in w, F(w) wins the next election | F a function from worlds to individuals } 

(e.g., Fi(w) = the election candidate in w with the biggest campaign budget in w) 
Naturally, what additional LF we might expect (4) to admit depends on what we 
decide the second LF for (17) looks like. In (21) 1 have given one possibility (along 
the lines of the functional wh-analyses of [2], [1] and [5]). This LF has the same basic 
architecture as the earlier one in (16), but differs in two important ways: first, the wh- 
phrase contains a silent affix; second, movement leaves a complex trace (it consists of 
an item that functions as a variable over concepts together with an item that functions 
as a variable over worlds). The net effect of these differences is to give individual 
concepts the role that individuals play in (16): this LF will make relevant propositions 
that vary with respect to individual concepts rather than individuals.'® If (4) admitted 
this LF, we would not be able to account for (4)’s meaning. Why is this? The deno- 
tation of (21) is (roughly) as in (22). This denotation means that (23) is among the 
propositions that the LF makes relevant. But Idea B suggests an analysis of John is 
the violinist (our (9a)) basically along these lines, so if (4) had this LF, we would 
expect John is the violinist to be a possible response - contrary to fact. 



'® Note the change in the order of person’s arguments, and the appearance of world variables in 
IP (I simplified away from these details in the discussion of the earlier LF). To get out of 
(21) a denotation that “ranges over” individual concepts, we will need the following denota- 
tions, among others: [[AFF]]® (P^^„Q = for all w in dom(K), P(K(w))(w) = 1 ; [[wh]]® 

= { P(N) I F(N) = 1 ) . (The new cross-categorial denotation for wh- en- 

ables it to range over individual concepts. The effect of AFF is that NP will be a predicate of 
individual concepts rather than of individuals. The effect of the “big trace” and the lambda 
that binds it will be to create out of the movement remnant too a function that takes individual 
concepts, rather than individuals, as arguments.) 
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(25) CP <s, <st,t» 




(22) Xu. {Xw. in w, K(w) has the “John” property | for all v, K(v) yields a person in v} 

(23) Xw. in w, the violinist in w has the “John” property 

4.3 Step iii: The Syntax Must Not Generate the Second LF 

So if we adopt Idea B, we arrive at a syntactic puzzle. We can account for the restric- 
tion on the use of (4) if we say that (4) admits the first kind of LF - the one in (16), 
the kind that makes relevant a set of propositions that differ with respect to an individ- 
ual - but does not admit the second kind of LF - the one in (21), the one that makes 
relevant a set of propositions that differ with respect to an individual concept. Evi- 
dently, even if the syntax can generate LFs of the second kind for other sentences, like 
(17), it cannot generate this LF for (4). The puzzle is: why is this second LF ex- 
cluded? 

I don’t know how tractable this problem is, but at least it is clear what the problem 
is that we have to solve. And there are places to start investigating. We noticed some 
ways in which the architecture of the second LF differs from that of the first. One is 
that the second LF contains a silent affix, another is that the second LF contains a 
complex trace. Might it be that the syntax is unable to generate a complex trace in the 
position where it appears in the second LF? These are the kinds of questions to ask. 

5 Concluding Remarks 

What I hope came out of the discussion is this. When we look at ways of combining 
assumptions about ontology, on the one hand, with ideas about interpretation, on the 
other, we find that some ways are more suited to describing natural language than 
other ways are. Therefore, if we take a stand on one of these things - in this case, a 
theory of how we interpret syntactic structures - we will be driven to conclusions 
about the other — in this case, about what view of possible worlds is the right one for 
modeling common grounds like the one Scenario II leads to. In this way, theories of 
how we interpret syntactic structures can become tools for investigating how we rep- 
resent common grounds. 
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The discussion so far suggested that we should adopt Idea B, on which it makes 
sense to talk about a “John” property that individuals have in some worlds but not 
others. This is interesting for a number of reasons. For one thing, it could help us to 
understand what articles in the copula literature mean when they say that name-like 
expressions can sometimes behave like “predicates” and sometimes like “referring 
expressions.” On Idea B, there is a natural way of explicating this terminology: a 
name like John behaves like a “predicate” when the “John” property holds of different 
individuals in different worlds in the context set, and behaves like a “referring expres- 
sion” when the “John” property holds of the same individual in all worlds in the con- 
text set. On Idea A, by contrast, explicating this terminology looks less straightfor- 
ward. 

Speculating, here is another advantage that Idea B might have. It might lead to in- 
sight as to why, unlike our friend (24a), parallel sentences with pronouns like (24b) 
seem bizarre in any context. When we ask how the denotations of questions like 
(24a) are built out of the parts of the sentence, it is a small step to say that the name 
John denotes an individual concept - the person who has the “John” property. It is 
another small step to say that in the structure for this sentence, John combines with a 
world variable, and the result occupies a slot reserved for an individual-denoting ex- 
pression ((25a)). Now, the idea to pursue is that, while names denote properties, pro- 
nouns denote individuals. In that case, in the structure for (24b), the pronoun will 
occupy the position that the name-world variable complex occupies in the structure for 
(24a). If the pronoun in (24b) is a simple variable, the result will be that, when we 
compute the denotation for the question in (24b), and we look at the propositions the 
question will makes relevant, we will find only tautologies or contradictions.** 

(24) a. Who is [ 1 1 John ] ? b. ?? Who is [ 1 1 him ]? 





[[(25a)]f= [[(25b)]r = 

A.U. {A.W. n = the person who has the A,u. {A,w. n = g(4) | n is a 

“John” property in w | n is a person in u} person in u) 



** The explanation is not obviously so simple: questions like Which one is t t him? are fine 
(imagine that we are at the opera, and know that a mutual friend of ours is in the choms), and 
this suggests that we might not always want to say that pronouns are simple individual vari- 
ables. 
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Appendix: A Third Idea 

Naturally, Idea A and Idea B are not the only views that one might entertain about 
how to model the common ground in Scenario II. Here is a sketch of a third idea. 
This third idea departs from our original assumptions in that it assumes that a set of 
possible worlds alone is not sufficient to describe an information state: in a certain 
sense, a description in terms of possible worlds has to be supplemented with a de- 
scription of what is known about who is named what. 

This third idea shares with Idea B the view that in all of our possible worlds the 
same individual - call him j: - is on the left, and the same individual — call him y — is 
on the right. And it shares with Idea A the view that names are meant to talk about 
individuals. But there is an important difference between the third idea and our other 
ideas. Idea A and Idea B assume that the parties to conversation have decided what 
individual, or property, it is that a name like John evokes, and therefore that sentences 
with the name John impart some kind of information to them. The third idea assumes 
instead that the parties to conversation have not resolved what individual to associate 
with the name John, and if they were to hear a sentence like John is happy, they would 
not necessarily be able to determine a way in which to reduce the context set. 

( 26 ) 



w 
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A proponent of this third view would look at our scenario as a case in which com- 
munication does not proceed quite as smoothly as it could. The idea would be roughly 
as follows. At the outset, since we do not know that to talk about John is to talk about 
X, there are a lot of sentences with the name John that we could not use to narrow 
down the worlds in our candidate set - including the phrase that accompanied the 
introduction, say I present you John. The principal effect of a sentence like The guy 
on the left is John ((7)) is then not to eliminate worlds from our candidate set, but just 
to let us know that there is a certain way of talking about the worlds in our candidate 
set - that to talk about John is to talk about x. Once we know this, we are then in a 
position to use the information provided by other phrases, like I present you John, and 
to narrow down our candidates to those worlds in which we have been introduced to x. 

In other words, unlike on Idea A and Idea B, on the third idea there is no sense in 
which the worlds in the context set divide up into two classes according to who is 
John. And an assertion like The guy on the left is John is not designed to get us to 
eliminate worlds from our context set - in fact, it does not directly cause the elimina- 
tion of any. Rather, it is designed to allow us to interpret previously uninterpretable 
messages.*^ 

What consequences might this third idea have for an analysis of our fact about the 
use of questions like (4)? In fact, it is not transparent how to apply this idea, but here 
is an attempt. 

Maybe one could maintain what might at first look like a non-starter — that the 
question makes relevant a set of propositions that, in terms of the way they are built 
up, vary with respect to an individual, but that are either a tautology or a contradiction 
((27)). However, we must also stipulate that the question requires its answer to take a 
certain form: to be of the form X is John. What will happen then? The idea is roughly 
as follows. Since one of the propositions is a tautology, the answerer will have the 
obligation to utter a proposition that is true in all worlds in the context. As soon as he 
answers a sentence of the form The guy on the left is John, the parties to conversation, 
who know that x is the guy on the left, will see that to talk about John is to talk about 
X. This, then, looks like another case where we would like to say that a question 
makes relevant a set of propositions that vary with respect to an individual. So at first 
glance at least, it looks as though, since we want to insure variation with respect to an 
individual, we are going to wind up with the same problems that we have to face with 
Idea B. 

(27) {Xw.x = j, 

Xw.y=i, 

Xvt. z = j, 

} 



Line Mikkelsen pointed out to me that the proposals of Groenendijk, Stokhof and Veltman 
([3]) are in this spirit. 
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Abstract. Gontrolled and restricted dialogue systems are reliable 
enough to be deployed in various real world applications. The more 
conversational a dialogue system becomes, the more difficult and un- 
reliable become recognition and processing. Numerous research projects 
are struggling to overcome the problems arising with more- or truly con- 
versational dialogue system. We introduce a set of contextual coherence 
measurements that can improve the reliability of spoken dialogue sys- 
tems, by including contextual knowledge at various stages in the natural 
language processing pipeline. We show that, situational knowledge can 
be successfully employed to resolve pragmatic ambiguities and that it 
can be coupled with ontological knowledge to resolve semantic ambi- 
guities and to choose among competing automatic speech recognition 
hypotheses. 



1 Introduction 

Following Allen et al. (2001), we can differentiate between controlled and conver- 
sational dialogue systems. Since controlled and restricted interactions between 
the user and the system decrease recognition and understanding errors, such sys- 
tems are reliable enough to be deployed in various real world applications, e.g. 
timetable or cinema information systems. The more conversational a dialogue 
system becomes, the less predictable are the users’ utterances. Recognition and 
processing become increasingly difficult and unreliable. Research projects are 
struggling to overcome the problems arising with more- or truly conversational 
dialogue systems, e.g. Wahlster et al. (2001). Their goals are more intuitive 
and conversational natural language interfaces that can someday be used in real 
world applications. The work described herein is part of that larger undertaking: 
we view the handling of contextual - and therefore linguistically implicit - infor- 
mation as one of major challenges for understanding conversational utterances 
in complex dialogue systems. 

In this paper we report on a set of research issues, solutions and results 
pertinent to the construction of mobile multi-domain spoken dialogue systems. 
These systems aim at providing conversational speech interfaces to complex and 
heterogeneous applications and their domains, e.g. touristic, spatio-geographic 
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or entertainment information as well as various assistance domains such as plan- 
ning, electronic communication or electronic commercial transactions. A com- 
mon feature of the solutions, to be described below, is that they involved the 
inclusion of extra-linguistic contexts into the natural language processing (NLP) 
pipeline by applying contextual coherence measurements. 

In this work, we will focus on two specific extra-linguistic knowledge stores - 
namely ontological- and situational knowledge - and introduce the corresponding 
ontological - and situational coherence measurements.^ Ontological knowledge, 
for example, may assert that a bakery is a store and that it has specific proper- 
ties, such as opening times, specific goods for sale etc. Situational knowledge, on 
the other hand, may assert that the bakery Seitz is located in a specific street 
and currently open. Given a user utterance such as: Is there a bakery somewhere 
around here?, we ultimately want an NLP system to understand that the user 
might want to go there in order to buy something to eat and supply correspond- 
ing spatial instructions - to the nearest bakery or other shop depending on what 
is actually open given the situation at hand - rather than answering the ques- 
tion solely with yes or no. While the ontologies employed herein model more or 
less static world, conceptual and common-sense knowledge concerning types and 
roles (Russell and Norvig, 1995) based on the standard combinations of frame- 
and description logics, situational knowledge is induced in specific instances and 
highly dynamic states of affairs. 

Our overall goal is to produce reliable natural language understanding com- 
ponents that increase dialogue quality metrics,^ by applying context sensitive 
analysis such as described below. After a brief outline of contextual processing 
in spoken dialogue systems in Sect. 2, we will introduce situational coherence 
and the resulting model, employing data, analyses and results from the domain 
of spatial information in Sect. 3. We will discuss data, results and model for on- 
tological coherence scoring applied in automatic speech recognition and semantic 
interpretation in Sect. 4. A conclusion on contextual coherence scoring is given 
in Sect. 5. 

2 Contextual Interpretation in NLP 

Utterances in dialogues, whether in human-human interaction or human- 
computer interaction, occur in a specific situation that is composed of different 
types of contexts. A broad categorization of the types of context relevant to spo- 
ken dialogue systems, their content and respective knowledge stores is given in 
Table 1. Following the common distinction between linguistic and extra-linguistic 
context^ our first category, i.e. the dialogical context, constitutes the linguistic 
context, encompassing both co-text as well as intertext. 

^ The role of linguistic- and user-context for NLP is included via discourse-, user- and 
belief-modeling (LuperFoy 1999, Paris 1993, Narayanan 1997). 

^ Measurable in the PARADISE evaluation framework (Walker et al. 2000). 

® All extra-linguistic contexts are also often referred to as the situational context (Con- 
nolly, 2001). however, we adopt a finer categorization thereof. 
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Table 1. Contexts, content and knowledge sources 



types of context 


content 


knowledge store 


dialogical context 


what has been said by whom 


dialogue model 


situational context 


time, place, etc 


situation model 


interlocutionary context 


properties of the interlocutors 


user model 


domain context 


world / conceptual knowledge 


ontology 



In linguistics the study of the relations between linguistic phenomena and 
aspects of the context of language use is called pragmatics. Any theoretical 
or computational model dealing with reference resolution, e.g. anaphora- or 
bridging resolution, spatial- or temporal deixis, or non-literal meanings, requires 
taking the properties of the context into account. In current spoken dialogue 
systems contextual interpretation follows semantic interpretation, which follows 
automatic speech recognition (ASR) (additionally fused with other modality- 
specific information). That is, the modality-specific signals, (e.g. speech or ges- 
ture) are transfered into graphical representations (e.g. word- or gesture graphs) 
and then fused and mapped onto some meaning representation followed by con- 
textual interpretation (Allen, 1987). Computationally, this implies that context- 
independent graphical and semantic representations can be computed and the 
context-dependent contributions are associated with the semantic interpretation 
thereafter, resulting in the final representation. 

This so-called modular view supports a distinct study of meaning (corre- 
sponding to the semantic representation) without having to muck around in the 
mirky waters of language use. This view is supported by the claim that some 
semantic constraints seem to exist independent of context. In this work we pro- 
pose a different view that also allows for context-independent constraints, but 
offers a less modular point of view of contextual interpretation. We will show 
that, given the notion of context introduced above, contextual analysis can be 
employed already at the level of speech recognition, during semantic interpre- 
tation and, of course, thereafter. The central claim is being made, that - as in 
human processing - contextual knowledge can be used successfully in a compu- 
tational framework in all processing stages."^ While most research in linguistics 
has consequently departed from this view, most computational approaches still 
feature a modular pipeline architecture in that respect. 

In linguistics utterances which are context-dependent are called indexical ut- 
terances (Bunt, 2000). Computationally they exhibit a difference in their seman- 
tic and final representation. Indexical utterances are - by virtue of the pervasive- 
ness of contextual knowledge - the norm in discourse, with linguistic estimations 

In recent times the so-called modular theory of cognition (Fodor, 1983) has been 
abandoned more or less completely. The so-called new look or modern cognitivist 
positions hold that nearly all cognitive processes are interconnected, and freely ex- 
change information; e.g. influences of semantic and pragmatic features have been 
shown to arise already at the level of phonological processing (Bergen, 2001). 
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of declarative non-indexical utterances around 10% (Barr-Hillel, 1954). Without 
contextual knowledge utterances, or fragments thereof, become susceptible of 
interpretation in more than one way. Computer languages are designed to avoid 
anaphoric, syntactic, semantic and pragmatic ambiguity, but human languages 
seem to be riddled with situations where the listener has to choose between mul- 
tiple interpretations. In these cases we say that the listener performs pragmatic 
analysis; corresponding to contextual interpretation on the computational side. 
For human beings the process of resolution is often unconscious, to the point that 
it is sometimes difficult even to recognize that there ever was any ambiguity. 

The phenomenon that this process of resolution, frequently goes unnoticed 
is due to the fact that in many cases the ambiguity is only perceived if the con- 
textual factors that allowed the listener interpret the utterance unambiguously 
are missing. For example, if shared ontological and situation-specific knowledge 
provided information that was elided in the utterance. These utterances/texts, 
therefore, become ambiguous only after they have been stripped of discourse-, 
situation-, domain- and speaker-context, and, for example, appeared as a text(- 
fragment) in a linguistics textbook. The problem for computational linguistics 
originates at least partially in the fact language understanding has to make do 
with exactly such a contextually and pragmatically impoverished input. 

3 Situational Coherence 

In this section we display findings from experiments tailored towards identify- 
ing and learning contextual factors relevant to understanding a user’s utterance 
in an uncontrolled dialogue system. That system supplies touristic and spa- 
tial information (Porzel and Strube, ). In this data we find many instances of 
phenomena usually labeled as pragmatic ambiguity. In our view these examples 
constitute bona fide cases for contextual interpretation after phonological and 
semantic processing has been concluded. We show how natural language anal- 
ysis can employ models that incorporate specific situational factors, resulting 
in a context-dependent analysis of the given utterances, thereby increasing the 
conversational capabilites of dialogue systems. 

Several NLP research efforts have adopted the tourism domain as a suitably 
complex challenge for an intuitive conversational natural language processing 
system (Johnston et al. 2002, Wahlster et al. 2001). Supplying spatial infor- 
mation, specifically spatial instructions and spatial descriptions, constitutes an 
integral part of the functionality of a mobile tourist information system. We re- 
gard a spatial instruction - e.g. “In order to get to the castle you have to turn 
right and follow the path until you see the gate tower” - as a felicitous response 
to a corresponding instructional request. A spatial description - e.g. “The 
Cinema Gloria is near the marketplace on the Hauptstrasse ” - is appropriate 
for a descriptive request. 

We can, therefore, say that a spatial instruction is an appropriate response 
to an instructional request and a spatial description, e.g. a localization, consti- 
tutes an appropriate response to a descriptive request. Responding with one to 
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the other does not constitute a felicitous response, but can be deemed a mis- 
understanding of the questioner’s intention, i.e. an intention misrecognition. In 
all dialogue systems intention misrecognitions decrease the overall evaluation 
scores, since they harm the dialogue efficiency metrics, as the user is required 
to paraphrase the question, resulting in additional dialogue turns. Furthermore, 
satisfaction measures decrease along with perceived task ease and expected sys- 
tem behavior.® 

The Data: In an initial data collection (Porzel and Gurevych, 2002) we find 128 
instances of instructional requests out of a total of roughly 500 requests from 49 
subjects. The types and occurrences of these categories are in Table 2. 



Table 2. Request types and occurrences 



Type Example 


# 


% 


(A) How interrogatives, e.g.. How do I get to the Fischergasse 


38 


30% 


(B) Where interrogatives, e.g.. Where is the Fischergasse 


37 


29% 


(C) What/which interrogatives, e.g.. What is the best way to the castle 


18 


14% 


(D) Imperatives, e.g.. Give me directions to the castle 


12 


9.5% 


(E) Declaratives, e.g., / want to go to the castle 


12 


9.5% 


(F) Existential interrogatives, e.g.. Are there any toilets here 


8 


6% 


(G) Others, e.g., I do not see any bus stops 


3 


2% 



While handling both instructional and descriptive requests for spatial infor- 
mation our parsers identify types A, C, D and E as instructional request. This 
corresponds to a baseline of recognizing roughly 63% of the instructional re- 
quests contained in our first data sample as such. Changing the grammars to 
treat type B and F as instructional request would consequently raise the cov- 
erage to 98%. However, Where interrogatives do not only occur as requests 
for spatial instructions but also as requests for spatial descriptions, i.e. local- 
izations.® The problem is that the current parser grammars either interpret all 
Where interrogatives as descriptive requests or as instructional requests. This 
implies that both systems can either misinterpret 29% of the instructional re- 
quest from our initial data as descriptive requests or misinterpret all descriptive 
request as instructional ones. In short, they lack a systematic way of asking 
which type of Where interrogative might be at hand.^ 

Resulting from these observations we conducted an experiment in which we 
ask people on the street always the same Where interrogative, i.e. Excuse me, 
can you tell me where X is. We logged several factors: 

® Unfortunately in PARADISE dialogue quality metrics are not effected by intention 
misrecognitions, as they are not taken into account (Walker et al. 2000). 

® Numerous instances of Where interrogatives requesting spatial localizations can 
be found also in other corpora such as the HCRC Map Task Corpus. 

^ As the data discussed herein show a simple approach to employ the system’s class- 
based lexicon to make this decision hinge on the object-type, e.g. building or street, 
will not suffice to solve the problem completely. 
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- the goal object, i.e. either the castle, city hall, a specific school, a specific 
discotheque, a specific cinema, a bank (ATM) and a specific clothing store, all 
of which can be either open or closed depending on the time of day, 

- the time of day (i.e. morning, afternoon, evening), 

- the proximity to the goal object, i.e. near (< 5 minutes walk), medium (5 - 30 
minutes walk) and far (> 30 minutes walk). - additionally we kept track of the 
approximate age group (young, middle, old) and gender of the subjects. 

In this set of contextual features we find that the results of generating decision 
trees and rules applying a c4.5 learning algorithm (Winston, 1992), show that: 

- if the object is currently closed, e.g. a discotheque or cinema in the morning, 
almost 90% of the Where interrogatives are answered by means of localizations, a 
few subjects asked whether we actually wanted to go there now, and one subject 
gave instructions. 

- if the object is currently open, e.g. a store or ATM machine in the morning, 
people responded with instructions, unless - and this we did not expect - the 
goal object is near and can be localized by means of a reference object that is 
within line of sight. 

Looking at the problem of analyzing Where interrogatives correctly, we 
can conclude already that, depending on the combination of at least two con- 
textual features, accessibility and proximity, responses were either instructions, 
localizations or questions. The following sections will describe how we have cho- 
sen to incorporate findings such as the ones described above into the natural 
language understanding process. 

Requirements for Contextual Analysis: We have noted above that current 
natural language understanding systems lack a systematic way of asking, for 
example, whether a given Where interrogative at hand is construed as an in- 
structional or a descriptive request. Speakers habitually rely on situational and 
other contextual features to enable their interlocutor to resolve such constru- 
als appropriately. This is not at all surprising, since conversational dialogues - 
whether in human-human interaction or human-computer interaction - that oc- 
cur in a specific context are consequently composed of utterances based upon 
specific knowledge of that context. 

In order to capture the diverse kinds of contextual information, studies and 
experiments of the type described above need to be conducted, so that the 
individual factors and their influences for a set of additional construal resolutions 
can be identified and formalized. Looking at the domain of spatial information 
alone we find a multitude of additional decisions that need to be made in order to 
enable a dialogue system to produce felicitous responses. Next to the instruetion 
versus loealization decision, we find construal decisions, such as: 

— does the user want to enter, view or just approach the goal object 

— does the user want to take the shortest, fastest or nicest path 

— does the user intend to walk there, drive or take public transportation 

as relevant to answering instructional requests felicitously. In many cases, e.g. 
the ones noted above, construal resolution corresponds to an automatic context- 
dependent generation of paraphrases in the sense of Ebert et al. 2001. That is. 
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to explicate information that was left linguistically implicit, e.g. to expand an 
utterance such as How do I get to the castle depending on the context into How 
do a get to the castle by car on a scenic route. 

These decisions hinge on a number of contextual features much like the in- 
struction versus localization decision discussed above.® In our minds a model 
resolving the construal of such questions has to satisfy the following demands: 

— it has to model the data collected in the experiments, which provide the 
statistic likelihoods of the relevant factors, for example, the likelihood of 
a Where interrogative being construed as a descriptive or instructional 
request, given the accessibility of the goal object, 

— it has to be able to combine the probabilistic observations from various het- 
erogenous knowledge sources, e.g. what if the object is currently accessible, 
but too far away to reach within a given time period, 

— it has to be robust against missing and uncertain information, as these con- 
textual features may not always be observable, e.g. in case specific services of 
the system such as location modules (GPS) or weather information services 
are currently offline. 

Applying the Contextual Analysis: As a first approach we have chosen 
Belief- or Bayesian networks employing a generalized version of the variable 
elimination algorithm, described in Cozman (2000), to represent the relations 
and conditional probabilities observed in the data and to compute the poste- 
rior probabilities of the decision at hand. Bayesian networks are well-suited for 
combining heterogeneous, independent and competing input to produce discrete 
decisions and can even be regarded as suitable mathematical abstractions over 
the cognitive processes underlying the way human speakers process natural lan- 
guage (Narayanan and Jurafsky, 1998). The simplest network possible, estimat- 
ing the liklihood of a Where interrogative being construed as an instructional 
or descriptive request, needs only three observation nodes. These nodes observe 
whether a Where interrogative is at hand, the goal object is open or closed 
and its proximity to the user. The single decision node - whether a spatial lo- 
cation or instruction constitutes an appropriate response - is connected to the 
three observation nodes. 

We have linked the network to interfaces providing that contextual informa- 
tion. For example within the SmartKom framework (Wahlster et al. 2001), a 
database called the Tourist- Heidelberg- Content Base supplies information about 
individual objects including their opening and closing times. A global position- 
ing system built into the mobile device supplies the current location of the user. 
This is handed to the geographic information system that computes the respec- 
tive distances and routes to the specific objects. It is important to note that this 
type of context monitoring is a necessary prerequisite for context-dependent 

® Here also ontological factors, e.g. object type and role, additional situational factors, 
e.g. weather, discourse factors, e.g. referential status, as well as user-related factors, 
e.g. tourists or business travelers as questioners and their time constraints, constitute 
significant factors. 
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analysis. These technologies enable our model to make dynamic observations of 
the factors determined as relevant/significant by the data collected. 

These observations, captured by the monitoring modules and converted into 
a context representation, and the given utterance at hand, i.e. the parser output, 
constitute the input into our belief network. The resulting output constitutes a 
measurement of the situational coherence of the possible alternative readings. 
In other words it represents a list of ranked construals, e.g. a ranked list of two 
decisions for a given Where interrogative with their corresponding situational 
coherence scores (e.g. (probability (instruct), 0.64223 p(true | evidence) 0.35777 
p(false I evidence))). This can then be employed to interpret requests accordingly, 
i.e., the parser output is either converted into the system’s representation of an 
instructional or localizational request. 

Results: As we have seen the current baseline performance results in a misinter- 
pretation rate of 37% of the instructional requests of our initial data set. More 
specifically, all requests of type B and E, will falsely be interpreted as local- 
izational requests and type F is not recognized at all and causes the system 
to indicate non-understanding. The context-adaptive enhancement described 
herein, lowers the error rate to 8%, which, in our minds, constitutes a signif- 
icant improvement. If additional data indicate that we can treat Existential 
Interrogatives in a similar fashion, this would result in an additional lowering 
by 6%, leaving only 2% of the initial data set as unanalyzable for the system. 

4 Ontological Coherence 

As we have seen above one of the fundamental issues concerning pragmatic 
ambiguity, is to enable dialogue systems to pick the most appropriate reading 
given the contextual factors at hand. This is equally true for ambiguities that 
arise during semantic interpretation and automatic speech recognition. 



4.1 Speech Recognition Ambiguities: N-Best Lists 

A common phenomena found in different fields of NLP, e.g. automatic speech 
recognition, information retrieval or question answering, is that current pro- 
cessing techniques seem to hit a ceiling of performance. In ASR systems have 
progressed to a level where they are close to extracting as much information as 
possible from the acoustic stream. Some context-dependent features have been 
added to handle dialectal- and speaker-adaptation and dynamic lexica, to han- 
dle novel input (Rapp et al. 2000). However, neither ontological nor situational 
knowledge is taken into account, which leaves the known problem of dealing 
with phonetically indistinguishable input, unresolved. The classic example in 
the community is, that a large vocabulary speech recognition (LVSR) system, 
as needed for more conversational dialogue systems, could hardly differentiate 
between homonymic utterances such as: “it is hard to wreck a nice beacK' and 
“it is hard to recognize speecK’’ . Humans on the other hand hear either one or 
the other depending on the context. 
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Today’s LVSR systems rarely feature simple one-best hypothesis as interface 
between ASR and NLU. While that may suffice for restricted dialogue systems, 
most systems either operate on n-best-lists as ASR output or convert ASR word 
graphs (Oerder and Ney 1993) into n-best lists, given the distribution of acous- 
tic and language model scores (Schwartz 1990). In our data a user expressed 
in Example (1) the wish to see a specific city map again, leading to the top 
two speech recognition hypotheses (la,lb). Annotators found that Example (la) 
constituted a pretty much well formed representation of the utterance whereas 
Example (lb) constituted an inadequate representation thereof: 



(1) Ich wiirde die Karte geme wiedersehen 



I 


would the 


map like 


to see again 


(la) - 


Ich 


wiirde 


die 


Karte 


eine 


wieder sehen 


- 


I 


would 


the 


map 


one 


again see 


(lb) - 


Ich 


wiirde 


die 


Karte 


eine 


Wiedersehen 


- 


I 


would 


the 


map 


one 


Good Bye 



Facing multiple representations of a single utterance consequently poses the ques- 
tion which of the different hypotheses most likely corresponds to the user’s utter- 
ance. Several ways of solving this problem have been proposed and implemented 
in various systems, i.e. to use scores provides by the ASR system, i.e. acoustic and 
language model probabilities or to use scores provided by the natural language 
understanding and discourse modeling components, c.f. Litman et al. (1999). 

We claim that contextual extra-linguistic knowledge can as well be used 
at this point to provide further information and to help in solving this task, 
especially in those cases where ASR and semantic scores fail. In the following 
we will report on the experimental setup and evaluations of this claim, thereby 
introducing the central notion of ontological coherence. 

The Data: An initial experiment was reported in Gurevych et al. (2002) 
where we tested, whether or not human annotators could reliably classify 2300 
speech recognition hypotheses (SRH) in terms of their ontological coherence, i.e. 
whether or nor a given hypothesis constitutes an internally coherent utterance. 
On an additional corpus of 1400 hypotheses we showed in recent experiments 
that annotators could also reliably (>94%) identify the best hypothesis, given a 
transcribed utterance and the corresponding SRHs choices. 

Requirements and Application: The corresponding contextual analysis, 
then, needs to provide a coherence score automatically, that can be employed by 
any NLU system to select the best hypothesis from the N-best list independently 
or in conjunction with acoustic or statistical scores. We employ the OntoScore 
system described in Gurevych et al (2003): Given a frame- and description logic- 
based ontology - e.g. a semantics as defined in oil-rdfs, daml+oil, or owl we 
map words to concepts and compute the average path-length of the shortest 

See www.daml.org or www.w3c.org/rdf. 
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graph found connecting all concepts excluding isa relations in the individual 
path-length measures, that we fitted with a conceptual context addition. 
Results: Using the SmartKom system and its pre-existing ontology, On- 
TOSCORE correctly assigns the highest score to over 84% of the best hypothesis 
as defined in the merged human gold standard (baseline 63.91%). This coher- 
ence measuring method has, therefore, been shown to exhibit much greater than 
baseline-performance in an additional task and performs better or equal com- 
pared to the alternative scoring methods. 



4.2 Semantic Interpretation and Construal 

In much the same way ontological coherence can be employed to disambiguate 
between multiple representations of a user’s input. We will show how it can 
serve to assist in semantic interpretation, i.e. in resolving semantic construal, 
that underlies many non-syntactic phenomena involving unconventional meaning 
(Langacker, 1998). Employing simple examples from our tourism domain: 

(2) Goethe often visited the historical museum. 

(3) The Palatine museum was moved to a new location in 1951. 

(4) The apothecary museum was renovated in 1983. 

(5) In 1994 the museum bought a new Matisse. 

we find four instances of noun phrases featuring the word museum as an argu- 
ment of four main verbs: visited, moved, renovated and bought. Linguistic anal- 
yses may vary in their classifications, however, commonly. Example (2) would 
be regarded as pretty conventional. Examples (3) and (4) as polysemous and 
Example (5) as metonymical language use with respect to the word museum. In 
many cases we find lexical ambiguities as for kommen in the SRH Example (6) 
and Example (7): 

(6) was fiir Spielfilme kommen heute Abend 

what for films come today evening 

(7) wie kann ich mir zur Schloss kommen 

how can I me to castle come 

Due to the persuasiveness of construal in natural language, a formal model 
thereof as well as an account of its mechanisms, constitutes an important part 
of any approach to natural language understanding. 

Data: As shown in Poesio (2002) about 50% of all noun phrases in their cor- 
pora are discourse-new, anaphoric noun phrases make up 30% of their data. 
The remaining 20% are made up by so-called associative expressions. In an ad- 
ditional experiment annotators labeled correct word-senses for all cases (1415 
markables) of multiple word to concept mappings. For example, in SRHs con- 
taining forms of the verb kommen (to come/showing on), a decision had to be 
made whether it is a MotionDirectedTrEinsliterated, as in Example (6) or 
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WatcliPerceptualProcess as in Example (7) or undecidable - which was only 
the case in non-best SRHs (see Sect. 4.1). 

Requirements and Application: Previous work on resolving ambiguities, 
metonymic language use and other types of associative meaning (Poesio, 2002) 
exists that also employ various kinds of hierarchical knowledge bases, showing 
promising results in domain-specific settings. The actual content of an ontology 
depends on the specific modeling choices made while constructing the ontology. 
Due to individual differences and even internally heterogeneous modeling choices, 
we need flexible algorithms for retrieving the appropriate information from the 
knowledge base, unlike those employed in previous approaches. Additionally, 
the semantic web projects bring forth a multitude of external ontologies, whose 
modeling choices need not be known beforehand. Yet, if dialogue systems intent 
to profit from this undertaking, they will need to be able to extract the nec- 
essary information without knowing the specific modeling choices. As proposed 
in Porzel and Bryant (2003) an extra-linguistic knowledge store - an ontology 
- can be employed to find sets of alternative readings by searching the concep- 
tual graph in ways as permissive as radial categories suggest. These ontological 
substitutions constitute an addition to ambiguity mappings from the lexicon. 

This has been interfaced to the ontological coherence scoring application, 
i.e. OntoScore, to calculate how often contextual coherence picks the appro- 
priate reading. In order to aid semantic interpretation by means of contextual 
knowledge we can apply the same algorithm employed to score sets of speech 
recognition hypotheses for scoring different potential trigger - target pairings 
with respect to their ontological coherence. For metonymy or bridging resolu- 
tion, however, an initial processing step is needed to find sets of possible pairings, 
i.e. candidates that are potentially more ontologically coherent. 

Results: As a result of measuring the ontological coherence of the conceptual 
representations we get a corresponding ranking for the alternative readings. 
Looking at the case of kommen as showing (on TV) versus coming (to/from), 
given a pre-existing ontology we find 85 occurrences of this ambiguity in which 
the contextually enhanced OntoScore picked in the correct reading in 72 cases, 
and not in 2 cases, and mixed in all 11 undecidable cases, which where not in 
the best SRH set. Baseline, given the majority distribution, was 56.5%. 

The inclusion of such contextual interpretation during and before seman- 
tic interpretation can enable natural language understanding systems to become 
more conversational without loosing the reliability of restricted dialogue systems. 
Our work on combining situational coherence measures as reported in Sect. 3 
with ontological coherence and discourse coherence has already shown an in- 
crease in performance on multiple tasks. We are, therefore, strongly encouraged 
by the results that this approach constitutes a suitable path towards making 
natural language processing more robust and human-like. 

While it is certainly feasible to limit bridging or metonymy resolution to a pre- 
defined set of ontological relations, such as has-part relations, if the ontology was 
especially crafted for that type of resolution (Poesio, 2002). 
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5 Contextual Coherence 

Extra-linguistic factors relate not only to the situational context, but also to the 
other context stores, such as the discourse, interlocutionary and ontological con- 
text. For an integrated model of common sense-based contextual coherence, we 
have introduced a way of integrating diverse knowledge sources into belief net- 
works by means of establishing a set of intermediate nodes that form a decision 
panel. In such a panel each weighable expert node votes on a common decision, 
e.g. the posterior probability of a Where interrogative being construed as a 
descriptive or instructional request, - or of the museum sense - as viewed from: 

- a situation expert observing, e.g., time, date, proximity, accessibility 

- a user expert observing, e.g., interests, transportation, thrift 

- a discourse expert observing, e.g., referential status, discourse accessibility 

- an ontological expert observing, e.g., object types and object roles 

These weights and votes of the experts are, then, combined to achieve resulting 
posterior probabilities for the decision at hand that equal 1 in their sum.^^ In 
the simple case of a single decision (i.e. instructive versus descriptive requests) 
we have seen that the model is able to capture the data adequately and behaves 
accordingly. The full blown model features situational factors as introduced in 
Sect. 3 as well as ontological factors as input to the contextually enhanced On- 
ToScoRE system. It’s integration into the SmartKom can be extended as col- 
lected data and monitoring capabilities, e.g. for the current weather conditions, 
become available. An additional reason for choosing these networks was that 
even if they become rather complex, they are naturally robust against missing 
and uncertain data, by relying on the priors in the absence of currently available 
topical data. This approach, therefore, offers a systematic and robust way of 
enabling natural language understanding modules to resolve different construals 
of conversational utterances via context-dependent analysis. 

6 Conclusion 

In this work we focus primarily on contextual interpretation that makes NLP 
applications more reliable and conversational. We rely on two primary contex- 
tual knowledge stores: world- and situational-knowledge captured, herein, by 
means of formal ontologies and belief networks. We argue that the addition of 
extra-linguistic knowledge, i.e. situational and ontological knowledge, can repre- 
sent and integrate the diverse knowledge sources necessary for context-dependent 
natural language analysis. As a result we showed decreases in the amount of mis- 
interpretations or intention misrecognitions applied at three stages in the pro- 
cessing pipeline of an implemented dialogue system. The application, thereby, 
increases the systems’ performances on features crucial to user satisfaction eval- 
uations, leading to measurable increases in evaluation criteria such as task ease. 

This addition offers a systematic way of combining evidences from independent fac- 
tors in belief networks and shrinks the conditional probability tables. 
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expected behavior as well as dialogue metrics, due to a decrease in the number 
of turns necessary to achieve task completion. 

We introduce contextual coherence measurements, i.e. the output of 
situational- and ontological coherence measurements. We employed these to find 
best-speech recognition hypotheses in n-best lists, rank ambiguous, polysemous 
and metonymical readings and resolve pragmatic ambiguities via inferences from 
knowledge- and belief-models - based on common sense knowledge. The general 
model introduced shows how such scores reflect a set of additional common sense 
constraints that can be applied as semantic- and pragmatic constraints next to 
phonological or morpho-syntactic constraints. We can, for example, consider the 
case of where questions as cases where all syntactic and semantic constraints are 
perfectly satisfied by a proposed filler, while pragmatic constraints concerning 
the accessibility of the goal object can be violated depending on the situational 
context. 

Since the approach described herein results in ranked lists of possible constru- 
als for a given utterance, we can define a threshold for cases where the resulting 
scores can be considered too close. If, for example, the difference of the poste- 
rior probabilities of the instruct - localize decision is between 0.1 and -0.1, 
the system can respond by asking the user: Do you want to go there or know 
where it is located?, which incidentally is also a response we found in our initial 
experiments. This, in turn, would result in more mixed initiative of conversa- 
tional dialogue systems next to increasing their understanding capabilities and 
robustness. 
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Abstract. We propose a new data model intended for peer-to-peer (P2P) 
databases. The model assumes that each peer has a (relational) database and ex- 
changes data with other peers (its acquaintances). In this context, one needs a data 
model that views the space of available data within the P2P network as an open 
collection of possibly overlapping and inconsistent databases. Accordingly, the 
paper proposes the Local Relational Model, develops a semantics for coordina- 
tion formulas. The main result of the paper generalizes Reiter’s characterization 
of a relational database in terms of a first order theory JTJ, by providing a syntactic 
characterization of a relational space in terms of a multi-context system. This work 
extends earlier work by Giunchiglia and Ghidini on Local Model Semantics m- 



1 Introduction 

Peer-to-peer (hereafter P2P) computing consists of an open-ended network of distributed 
computational peers, where each peer can exchange data and services with a set of other 
peers, called acquaintances. Peers are fully autonomous in choosing their acquaintances. 
Moreover, we assume that there is no global control in the form of a global registry, global 
services, or global resource management, nor a global schema or data repository of the 
data contained in the network. Systems such as Napster and Gnutella popularized the 
P2P paradigm as a version of distributed computing lying between traditional distributed 
systems and the web. The former is rich in services but requires considerable overhead 
to launch and has a relatively static, controlled architecture. The latter is a dynamic, 
anyone-to-anyone architecture with little startup costs but limited services. By contrast, 
P2P offers an evolving architecture where peers come and go, choose whom they deal 
with, and enjoy some traditional distributed services with less startup cost. 

We are interested in data management issues raised by this paradigm. In particular, 
we assume that each peer has data to share with other nodes. To keep things simple, 
we further assume that these data are stored in a local relational database for each peer. 
Since the data residing in different databases may have semantic inter-dependencies, 
we require that peers can specify coordination rules which ensure that the contents of 
their respective databases remain “coordinated” as the databases evolve. For example, 
the patient database of a family doctor and that of a pharmacist may want to coordinate 
their information about a particular patient, the prescription she has been administered, 
the dates when these prescriptions were fulfilled and the like. Coordination may mean 
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something as simple as propagating all updates to the PRESCRIPTION and MEDICA- 
TION relations, assumed to exist in both databases. In addition, we’d like to support 
query processing so that a query expressed with respect to one database fetches infor- 
mation from other relevant databases as well. To accomplish this, we expect the P2P data 
management system to use coordination rules as a basis for recursively decomposing the 
query into sub-queries which are translated and evaluated with respect to the databases 
of acquaintances. 

Consider the patient databases example, again. There are several databases that store 
information about a particular patient (family doctor, pharmacist, hospitals, specialists.) 
These databases need to remain acquainted and coordinate their contents for every shared 
patient. Since patients come and go, coordination rules need to be dynamic and are 
introduced by mutual consent of the peers involved. Acquaintances are dynamic too. If 
a patient suffers an accident during a trip, new acquaintances will have to be introduced 
and will remain valid until the patient’s emergency treatment is over. 

In such a setting, we cannot assume the existence of a global schema for all the 
databases in a P2P network, or just those of acquainted databases. Firstly, it is not clear 
what a global schema means for the whole network, given that the network is open-ended 
and continuously evolves. Secondly, even if the scope of a global schema made sense, it 
would not be practical to build one (just think of the effort and time required.) Finally, 
building a global schema for every peer and her acquaintances isn’t practical either, 
as acquaintances keep changing. This means that current approaches to information 
integration I Bill , are not applicable because they assume a global schema (and a global 
semantics) for the total data space represented by the set of peer databases. 

Instead, the Local Relational Model (hereafter LRM) proposed here only assumes the 
existence of pairwise-defined domain relations, which relate synonymous data items, as 
well as coordination formulas, which define semantic dependencies among acquainted 
databases. Local relational model is an evolution of a first attempt in this direction 
presented in m which had the main limitation in the languages adopted to express 
peer’s coordination. Among other things, LRM allows for inconsistent databases and 
supports semantic interoperability in a manner to be spelled out precisely herein. The 
main objective of this paper is to introduce the LRM, focusing on its formal semantics. 

The LMS semantics presented in this paper are an extension of the Local Model Se- 
mantics, a new semantics motivated by the problem of formalizing contextual reasoning 
in AI [0, which was first introduced in m. 

2 A Motivating Scenario 

Consider, again, the example of patient databases. Suppose that the Toronto General 
Hospital owns the Tgh database with schema: 

Patient(TGH#, DHIP#, Name, Sex, Age, FamilyDr, PatRecord) 

Patientinf o(0HIP#, Record) 

Admissionf AdmID , DHIP^, AdmDate, ProblemDesc, PhysID, DisDate) 
Treatment ( Treat ID . TGH^, Date, TreatDesc, PhysID) 

Medication(TGH^, Drug^)^, Dose, StartD , EndD) 

The database identifies patients by their hospital ID and keeps track of admissions, 
patient information obtained from external sources, and all treatments and medications 
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administered by the hospital staff. When a new patient is admitted, the hospital may 
want to establish immediately an acquaintance with her family doctor. Suppose the view 
exported by the family doctor DB (say, Davis) has schema: 

Patient(OHIP#, FName, LName, Phone#, Sex, PatRecord) 

Visit(QHIP#, Date , Purpose, Outcome) 

Prescription(OHIP#, Med#, Dose, Quaintity, Date ) 

Event(DHIP#, Date , Description) 

Figuring out patient record correspondences (i.e., doing object identification) is achieved 
by using the patient’s Ontario Health Insurance # (e.g., DHIP# = 1234). Initially, this 
acquaintance has exactly one coordination formula which states that if there is no patient 
record at the hospital for this patient, then the patient’s record from Davis is added to 
Tgh in the Patientinf o relation, which can be expressed as: 

V(Davis : fn,ln,pn,sex,pr).{Da\/\s : Patient(l234, /n, in, pn, sea;,pr) 

Tgh : 3{tghid, n, a). {Patient{tghid, 1234:, n, sex, a, Davis, pr) A (1) 

n = concat{fn. In))) 

In the above formula the syntax “V( Davis : fn,ln,pn, sex,pr) . . is a quan- 
tification of the variables fn,ln,pn,sex,pr in the domain of Davis; analogously 
the syntax Davis : Patient(1234, /n, in,pn, sea;,pr) states the fact that the tuple 
(1234, fn, ln,pn, sex,pr) belongs to the relation Patient of the database Davis. 

When Tgh imports data from Davis, the existentially quantified variables tghid, n 
and a must be instantiated with some concrete elements of the domain of Tgh database, 
by generating a new TGH# for tghid, by inserting the Skolem constant <undef-age> 
for a and by instantiated n with the concatenation of fn (first name) and In (last name) 
contained in Davis. Later, if patient 1234 is treated at the hospital for some time, another 
coordination formula might be set up that updates the Event relation for every treatment 
or medication she receives: 

V(Tgh : d, desc).((Tgh : 3{tid.tghid.pid.n.sex.a.pr). 

(Treatment(tid, tghid, d, desc,pid) A 
Patient(tpfiid, 1234, n, se®, o, Davis, pr)) — ^ (2) 

Davis ; Event(l234, d, desc))) 

V(Tgh -.tghid, drug, dose, sd,ed).{ 

Tgh : Medication(tpfiid, drug, dose, sd, ed) A 

3n, sex, a, p.Patiezit{tghid, 1234, n, sex, a, Davis, pr) — > 

Davis :Vd.(sd < d < ed ^ Bdesc. (Event(l234, d, desc) A ^ ^ 

desc = concat{drug, dose, ’’atTGHDB”)))) 



This acquaintance is dropped once the patient’s hospital treatment is over. Along similar 
lines, the patient’s pharmacy may want to coordinate with Davis. This acquaintance 
is initiated by Davis when the patient tells Dr. Davis which pharmacy she uses. Once 
established, the patient’s name and phone are used for identification. The pharmacy 
database (say, Allen) has the schema: 

Prescription(Prescr#, CustName, CustPhone#, DrugID, Dose, Repeats) 
Sales(CustNcmie, CustPhone#, DrugID, Dose, Date, Amount) 
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Here, we want Allen to remain updated with respect to prescriptions in Davis: 

V(Davis :fn,ln,pn,med, dose,qt){ 

Davis : 3ohip, date, sex, pr.{PrescTiption{ohip, med, dose, qt, date) A 

Patient{ohip, fn,ln,pn, sex,pr)) 

Allen : 3cn, amount. (Prescription{cn, pn,med, qt, dose, amount) A 
cn = concat{fn. In))) 

Of course, this acquaintance is dropped when the patient tells her doctor that she changed 
pharmacy. Suppose the hospital has no information on its new patient with OHIP# 1234 
and needs to find out if she is receiving any medication. Here, the hospital uses its 
acquaintance with Toronto pharmacies association, say TPhLtd. TPhLtd, is a peer that 
has acquaintances with most Toronto pharmacists and has a coordination formula that 
allows it to access prescription information in those pharmacists’ databases. For example, 
if we assume that Tphh consists of a single relation 

Prescription(Name, Phone^^, DrugID, Dose, Repeats) 
then the coordination formula between the two databases might be: 

V(Davis :fn,ln,pn,med, dose).{ 

Davis : 3ohip, qt, date, sex,pr.{ Prescription(o/iip, med, dose, qt, date) A 

P&tient(ohip, fn,ln,pn,sex,pr)) (5) 

Tphh : 3name, rep. (Prescription{name,pn, med, dose, rep) A 
name = concat{fn, In))) 

Analogous formulas exist for every other pharmacy acquaintance of TPhLtd. Apart from 
serving as information brokers, interest groups also support mechanisms for generating 
coordination formulas from parameterized ones, given exported schema information for 
each pharmacy database. On the basis of this formula, a query such as “All prescrip- 
tions for patient with name N and phone# P” evaluated with respect to Tphh, will be 
translated into queries that are evaluated with respect to databases such as Allen. The 
acquaintance between the hospital and TPhLtd is more persistent than those mentioned 
earlier. However, this one too may evolve over time, depending on what pharmacy in- 
formation becomes available to TPhLtd. Finally, suppose the patient in question takes a 
trip to Trento and suffers a skiing accident. Now the Trento Hospital database (TNgh) 
needs information about the patient from DavisDB. This is a transient acquaintance that 
only involves making the patient’s record available to TNgh, and updating the Event 
relation in Davis. 



3 Relational Spaces 

Traditionally, federated and multi-database systems have been treated as extensions of 
conventional databases. Unfortunately, formalizations of the relational model (such as 
[^) hardly apply to these extensions where there are multiple overlapping and hetero- 
geneous databases, which may be inconsistent and may use different vocabularies and 
different domains. We launch the search for implementation solutions that address the 
scenario described in the previous section with a formalization of LRM. 
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The model-theoretic semantics for LRM is defined in terms of relational spaces each 
of which models the state of the databases in a P2P system. These are mathematical 
structures generalizing the model-theoretic semantics for the Relational Model, as de- 
fined by Reiter in [IQ. Coordination between databases in a relational space is expressed 
in terms of coordination formulas that describe dependencies between a set of databases. 
Let us start by recalling Reiter’s key concepts. 

Definition 1 (Relational Language). A relational language is first order language L 
with equality, a finite set of constants, denoted dom, no function symbols and finite set 
R of predicate symbols. 

The set dom of constants is called the domain and represents the total set of data contained 
in a database, while the predicates in R represent its relations. For instance, the language 
of Davis contains the constant symbol 1234, the relational symbols such as Patient, the 
unary predicates OHIP#, FName, LName, Phone#, Sex, and PatRecord; a(Patient) = 
(DHIP^, FName, LNeune, Vh.OB.efi, Sex, andPatRecord). 

We use the notation x for a sequence of variables {x \ , . . . ,Xn) and d for a sequence 
of elements (di, . . . , d„), each of which belongs to the domain dom\ 4>{x) is a formula 
with the free variable x, and 4>{'x.) is a formula with free variables in x. 

Definition 2 (Relational Database). A relational database is a first order interpretation 
m of a relational language L on the set of constants dom, such that m{d) = d, for all 
constant d of L. 

Definition 0does not properly represent partial databases, i.e., database that contain 
null values or partial tuples. Indeed, if m is a relational database, m |= ^ or m |= 
-!(/) (where “|=” stands for “first order satisfiability”). In an incomplete database we 
would like to have for instance that neither f not nre trie. A common approach 
is to model incomplete databases as a set of first order structures, also called a state 
of information. We follow this approach, and formalize an incomplete database on a 
relational language L as a set of relational databases on L. Notice that the set of relational 
databases corresponding to an incomplete database all share the same domain, consisting 
of the set of constants contained in the database. The partiality, therefore, concerns only 
the interpretation of the relational symbols. With this generalization we can capture 
inconsistent, complete, and incomplete databases. For instance, if dba, dbt and db^ are 
three (partial) relational databases defined as 

dba = {mi}, dbb = {m 2 , m 3 }, db^ = 0 

where mi, m2, and m3 are relational databases, we have that they are respectively, com- 
plete, incomplete, and inconsistent. Generally, dbi is complete if \dbi \ = 1, incomplete 
if \dbi\ > 1 and inconsistent if dbi = 0 - 

Since we are interested in modelling P2P applications, we take a further step and 
consider, rather than a single database, a family (indexed with a set of peers I) of 
database. We call such of these databases a local database when we want to stress that 
it is a member of a set of (coordinated) databases. 

When we consider a set of databases the same information could be represented 
twice in two databases. In this case we say that they overlap. Overlapping databases 
have nothing to do with the fact that the same symbols appear in both databases — the 
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same constant can have completely different meanings in two databases — overlap occurs 
when the real world entities denoted by a symbol in different databases are somehow 
related. To represent the overlap of two local databases, one may use a global schema, 
with suitable mappings to/from each local database schema. As argued earlier, this is not 
feasible in a P2P setting. Instead, we adopt a localized solution to the overlap problem, 
defined in terms of pair-wise mappings from the elements of the domain of database i 
to elements of the domain of database j. 

Definition 3 (Domain relation). Let Li and Lj be two relational languages, with do- 
mains doMi and domj respectively; a domain relation from i to j is any subset of 
domi X domj. 



The domain relation represents the ability of database j to import (and repre- 
sent in its domain) the elements of the domain of database i. In symbols, rij(di) — 
{dj \ (di,dj) G Tij} represents the set of elements in which j translates the constant 
d of f’s domain. In many cases, domain relations are not, one to one, for instance if 
two databases represent a domain at a different level of details. Domain relation are not 
symmetric, for instance when represents a currency exchange, a rounding function, 
or a sampling function. In a P2P setting, domain relations need only be defined for ac- 
quainted pairs of peers. Domain relations between databases are conceptually analogous 
to conversion functions between semantic objects, as defined in |Ej. The domain relation 
defined above formalizes the case where a single attribute of one database is mapped 
into single attribute of another database. It is often the case, however, that two (or more) 
attributes of a database correspond to a single attribute in another one. An obvious is 
when the attributes f irst-name and last-name in a database i are merged in the unique 
attribute name of a database j. Domain relation can be generalized to deal with these 
cases by allowing, for instance, a domain relation ri, (first -name, iast-name),i:name to be a 
subset of dom^ x domj . 

Example 1 . Let us consider how domain relations can represent different data integration 
scenaria. The situation where two databases have different but equivalent representations 
of the same domain can be represented by taking r ^ and Vji as the translation function 
from domi to domj and vice-versa, namely Tij = . Likewise, disjoint domains can 

be represented by having = Vji — 0. Transitive mappings between the domains of 
three databases are represented by imposing ri 3 = ri 2 o r 23 - Suppose instead that domi 
and domj are ordered according to two orders <i and <j. A relation that satisfies the 
property: Vdi,d 2 G domi,di <i c ?2 Vd'i G r^(di), G rij{d 2 ). dj <j d^ 
formalizes a mapping which preserves the orders, such as currency exchange. Finally, 
suppose that a peer with database i doesn’t want to export any information about a certain 
object dg in its database. To accomplish this, it is sufficient to ensure that the domain 
relations from i to any other database j, do not associate any element to dg, namely 
Tijids) = 0 . 



Definition 4 (Relational space). A relational space is a pair {db, r), where db is a set 
of local relational databases on I and r is a function that associates to each i,j£l,a 
domain relation Vij. 
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Example 2. A relational space modeling the states of the database described in SectiorQ, 
is a pair 



where the first component, the local databases, contains five sets of interpretations of the 
relational languages associated to Tgh, Davis, Allen and T phh and TNgh, respectively; 
and the second component, the domain relation, contains four domain relations between 
those databases which have to coordinate according to constraints CIQ)- 

The fact that t — (1234, Pippo, Inzaghi, 444, M, Rec_23) is a tuple of the relation 
PatRecord of the Davis database, if formalized by requiring t G m(PatRecord) for 
each interpretation m G d&Davis- 

The fact that f = (TG64, 1234, ’’Pippoinzaghi” , M, <undef-age>, Davis, Rec_23) 
is a tuple of the relation Patient of Tgh database, is represented by requiring that, for 
each natural number n, with 0 < n < MaxAge, d&jgh contains a model a model m, with 
f[<undef-age>/n] G m(Patient) ( f[<undef-age>/n] is the result of substituting n 
for <undef-age> in t). 

The fact that the TGH^ 1234 uniquely identifies a patient in both T gh and Davis, is 
represented by requiring rDavi5Tgh(1234) = rTghDavis(1234) = {1234}. 

4 Coordination in Relational Spaces 

Two (or more) peers who want to coordinates each other, need a language in which they 
can express the inter-dependencies between the information stored in their database. To 
this purpose, we define a declarative language by which it is possible to express semantic 
relations between local databases. The formulas of this language, called coordination 
formulas can be used to describe cross-database views and cross-databases constraints. 

Definition 5 (Coordination formula). The set of coordination formulas CF on the 
family of relational languages is defined as follows for each i G I and each 

formula f o/iB 

CF -.— i- (t>\CF ~*CF\CF hCF\CFy CF\3i-. x.CF \ Vi : x.CF 

We use Greek letters f, ip, to denote formulas of any languages Li i G I, and Latin 
capital letters A, B, and C to denote coordination formulas. The basic building blocks 
of coordination formulas are expressions of the form i : f and are called atomic co- 
ordination formulas. An occurrence of a variable a; in a coordination formula is a/ree 
occurrence, if it is not in the scope of a quantifier. Examples of coordination formulas 
are shown in Sectionl3 

To give an interpretation of coordination formulas in relational spaces, let us start by 
considering Definition0 in detail. Item 1 states that coordination formulas are defined 
on the basis of atomic formulas of the form i : </>, where f is any formula of Li. i : f 
intuitively means “f is true in database i” and its interpretation follows the standard 

* The following precedence rules apply: i \ . . . has the highest precedence, followed by quanti- 
fiers, then A, then V, and finally — For instance, Vi : x.i ■ (j> t\ j \ f ^ k : 9 \J h ■. t], stands 




for: ((V(i : x).{i : f)) A (j : f)) -» (fk :9)y{h: p)). 
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rules of first order logic. Thus, in particular, if (p is of the form yx.tp{x) or of the form 
3x.'ip{x) then its interpretation is given in terms of the possible assignments of x to 
elements of dortii . 

The crucial observation for the evaluation of quantified formulas is that a free occur- 
rence of a variable can be quantified in four different ways: by Vx, 3a: within an atomic 
coordination formula (as from Item 1), and by Vi : a; or 3i : x, within a coordination 
formula. In the two latter cases the index i tells us the domain where we interpret x. Thus, 
the formula Vi : x.A{x) (where A{x) is a coordination formula and not a formula!) must 
be read as “for all elements d of the domain donii, A is true for d”. Likewise, 3i : x.A{x), 
must be read as “there is an element in the domain donii such that A is true”. The trick 
is that A, being a coordination formula, may contain atomic coordination formulas of 
the form j : 4>{x), with j ^ i. For instance in the coordination formula © the variables 
/n and In occur free a coordination formula with index T gh (the consequence of the 
implication), while they are bound by the quantifiers V(Davis : fn, In , . . . ). 

The intuition underlying the interpretation of quantified indexed variables is that, if 
a: is a variable being quantified with index i and occurring free in a coordination formula 
with index j, then we must find a way to relate the interpretation of x in domi to the 
interpretation of x in donij using the mapping defined by . More precisely, the coor- 
dination formula Vi : x.j : P{x), means, “for each object of donii, the corresponding 
object w.r.t. the domain relation in donij has the property P”. Thus, for instance, in 
order to check whether the coordination formula 

Vz : x.{i : P{x) j : Q(x) A k : R{x)) (6) 

is true in a relational space, one has to consider all the assignments that associate to 
the occurrence of x in i : P{x) any element of d G donii, and to the occurrences of 
X in j : Q{x) and k : R{x) any element of rij{d) and rik{d), respectively. Dually, 
the coordination formula 3i : x.j : P{x), means “there is an element in donij that 
corresponds w.r.t. the domain relation rji to an element of donii with property P”. Thus, 
for instance, in order to check whether the coordination formula 

3i : x.{i : P{x) Aj : Q{x) A k : R{x)) (7) 

is true in a relational space, one has to find an assignment that associates to the occurrence 
of X in i : P(x) an element d of donii, and to the occurrences of x in j : Q{x) and 
k : R{x) two elements d' G donij and d” G donik, respectively, such that d G rji{d') 
and d G rki{d"). 

Notice that in our explanation of the universal quantification we used Vij, while for 
existential quantification we used rji. This asymmetry is necessary to maintain the dual 
intuitive readings of existential and universal quantifiers. Indeed, the intuitive meaning 
of the formula Vi : x.j : P(x) is “for all d G donii, if d' G rij{d) then d' is in P”, 
which can be rephrased in its dual existential statement “there does not exist any element 
d' G rij{d), which is not in P”. Notice that in this last sentence, the quantification is on 
the elements of domj, namely on the elements in the codomain of the domain relation 
Tij, just like in the explanation of Equation above. 

To formalize the intuitions given above concerning the interpretation of coordination 
formulas, we need two notions. The first is coordination space of a variable x in a 
coordination formula. Intuitively this is the set of indexes of the atomic coordination 
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formulas that contain a free occurrence of x. The coordination space is the set of domains 
where x must he interpreted. Thus, for instance, the coordination space of x in the 
i : P{x) A j : Q{x) A k : R{x) is {i,j, k}. 

Definition 6 (Coordination space). The coordination space of a variable x in a coor- 
dination formula A is a set of indexes J C I, defined as follows: 

1. the coordination space ofx ini : f is {z}, ifx occurs free in according to the usual 
definition of free occurrence in a first order formula, and the empty set, otherwise; 

2. the coordination space of x in A o B (for any connective o ) is the union of the 
coordination spaces ofx in A and B; 

3. the coordination space ofx in Qi : y.A (for any quantifier Q) is the empty set, ifx 
is equal to y, and the coordination space ofx in A, otherwise. 

The second notion is that of assignment for a free occurrence of a variable in a coor- 
dination formula. To evaluate a formula A quantified over x with index i, an assignment 
must consider domi but also all the domains in the coordination space. To understand 
how assignments work, look at Equations 0 , 0 . In Equation 0 we proceed “forward” 
from domi to reach domj and dom^, by applying and r^fc. In this case we say that we 
have an z-to-{j, fc}-assignment. Instead, in Equation 0 , we proceed “backward” from 
domj and domk to reach domi by applying rji and rki- In this case we say that we have 
an z-from-{j, fc}-assignment. If J is a coordination space, z-to- J-assignments take care 
of the assignments due to universal quantification, while z-from- J-assignments take care 
of those due to existential quantification. 

Definition? (Assignment, ^-variation z-to-J-assignment, z-from-J-assignment). 

An assignment a = j is a family of functions ai, where assigns to any variable 

X an element of domi. An assignment a' is an x-variation of an assignment a, if a and 
a' differ only on the assignments to the variable x. Given a set .J Q I and an index 
i G I, an assignment a is an z-to-J-assignment of x if, for all j £ J distinct from i, 
(ai(x), aj(x)) € Tij. An assignment a is an z-from-J-assignment of x if, for all j £ J 
distinct from i, (aj(x), afix)) £ rji. 



Definitions (Satisfiability of coordination formulas). The relational space {db,r) 
satisfies a coordination formula A under the assignment a — {oijigj, in symbols 
{db, r) ^ A[a], according to the following rules: 

1. (db,r) ^ z : (j)[a], if for each m £ dbi, m \= filai]; 

2. (db, r) \= A ^ B[a], if {db, r) ^ A[a] implies that {db, r) |= B[a\; 

3. \db,r) 1= Af\B[a], if{db,r) ^ A[a\ and {db,r) ^ B[a\; 

4. {db, r) ^ A V B[a\, if {db, r) ^ A[a] or {db, r) |= B[a\; 

5. {db,r) \=\/i \ x.A[a],if{db,r) \= A[a'] for all assignments a' that are x-variations 
of a and that are i-to-J -assignments on x, where J is the coordination space ofx 
in A. 

6. {db,r) 1= 3z : a;.A[a], if {db,r) ^ A[a'] for some assignment a' that is an x- 
variation of a and that is an i-frorn-J -assignment on x, where J is the coordination 
space ofx in A. 
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A coordination formula A is valid if it is true in all the relational spaces. A coordi- 
nation formula A is a logical consequence of a set of coordination formulas F if, for any 
relational space {db, r) and for any assignment a, if{db, r) ^ /'[a] then {db, r) ^ A[a]. 

Item 1 states that an atomic coordination formula is satisfied (under the assignment 
a) if all the relational databases m G dbi satisfy it. Items 2-4 enforce the standard 
interpretation of the boolean connectives. Item 5 states that a universally quantified 
coordination formula is satisfied if all its instances, obtained by substituting the free 
occurrence of x in the atomic coordination formulas with index i with all the elements 
of domi, and the free occurrences of x in the atomic coordination formulas with index j 
different from i, with all the elements of domj, obtained by applying rij to the elements 
of domi, are satisfied. Item 6 has the dual interpretation. 

Finally, notice that the language of coordination formulas does not include negation. 
The addition of negation with the canonical interpretation “-lA is true iff A is not true”, 
implies the possibility to define the notion of "Global inconsistency", i.e., there are 
sets of inconsistent coordination formulas (e.g., {i : : f}). These sets are not 

satisfiable by any relational space. On the other hand, we have that the relational space 
composed of all inconsistent databases, is the “most inconsistent object that we can 
have (not allowing global inconsistency), we therefore should allow that this vacuous 
distributed interpretation satisfies any setxte of coordination formulas. Indeed we have 
that, in absence of negation, if = 0 and = 0, ( db^ , A for any coordination 

formula A. 

Coordination formulas can be used in two different ways. First, they can be used to 
define constraints that must be satisfied by a relational space. For instance, the formula 
VI : a:.(l : p{x) V 2 : q(x)) states that any object in database 1 either is in table p or its 
corresponding object in database 2 is in table q. This is a useful constraint when we want 
to declare that certain data are available in a set of databases, without declaring exactly 
where. As far as we know, other proposals in the literature for expressing inter-database 
constraints can be uniformly represented in terms of coordination formulas. 

Coordination formulas can also be used to express queries. In this case, a coordi- 
nation formula is interpreted as a deductive rule that derives new information based 
on information already present in other databases. For instance, a coordination formula 
Vi : x.(l : 3y.p(x, y) 2 : q{x)) allows us to derive qfb) in database 2, if p{a, c) holds 
in database 1 for some c, and b G r 12 (a). 

Definition 9 (i-query). An f-query on a family of relational languages {Liji^j, is a 
coordination formula of the form A(x) — > i : q(x), where A(x) is a coordination 
formula, and q is a new n-ary predicate symbol of Li and x contains n variables. 



Definition 10 (Global answer to an f-query). Let (db,r) be a relational space on 
{Liji^i. The global answer of an i-query of the form A(x) — > i : g(x) in (db, r) is the 
set: 



{d G dom"| (db, r) |= : x.(A(x) A z : x = d)} 



Notationally x = d stands for X\ — d\ A ... A Xn — d„, and 3i : x stands for 
3i : X\ . . .3i : Xn- Intuitively, the global answer to an z-query is computed by locally 
evaluating in dbj all the atomic coordination formulas with index j contained in A, and 
by recursively composing and mapping (via the domain relations) these results according 
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to the connectives and quantifiers that compose the coordination formula A. For instance 
to evaluate the query 

(i : P(x) V j : Q(x)) A k : E(x, y) h : q(x, y) 

we separately evaluate P{x), Q{x) and R{x, y) in i, j and k respectively, we map these 
results via Xih, rjh, and rkh respectively obtaining three sets Si C donih sj C donih 
and Sk C dom\. We then compose Si, Sj and Sk following the connectives obtaining 
(si X Sj) n Sk, which is the global answer of q. 

Notice that the same query q has different answers depending on the database it 
is asked to (because of the quantification over i : x). Notice also that Definition QT] 
reduces to the usual notion of answer to a query when A is an atomic coordination 
formula i : (j) (case of a single database i). Finally, but most importantly, queries can 
be recursively composed. Indeed, a recursive query can be defined as a set of queries 
{qh ■■= Ah{yih) ih ■ qh{y^h)}i<h<n such that Ah{yih) can contain of an atomic 
coordination formula ik : qki^k) for some 1 < k < n. The evaluation of a query q^ 
in the i^-th database is done by evaluating its body, i.e., the coordination formula Ah, 
which contains the query qk- This forces the evaluation of the query qk in the i^-th 
database, and so in P2P network. We can prove the following theorem 

Theorem 1. Let (db,r) be a relational space and rq = {qh ■= Ah(yth) in '■ 
qh{'^h)}i<h<n be a recursive query. If Alyx.) does not contain any — » symbol, then there 
are n minimal sets ansi , . . . , anSn, such that each ansh is the global answer of the 
query qh, in the relational space (^db',r'), where db' is obtained by extending every 
relational database m G dbi^ with m{qk) = ansk,for each k h. 

5 Representation Theorems 

In this section we generalize Reiter’s semantic characterization of relational databases 
to relational spaces. We start by recalling Reiter’s result (in a slightly different, but 
equivalent, formulation). 

Definition 11 (Generalized relational theory). A theory T on the relational language 
L is a generalized relational theory if the following conditions hold. 

- ifdom — {di, . . . dn}, ^x{x = di M ... V x = dn) G T; 

- for any d, d' G dom, d f d' G T; 

- for any relational symbol i? G R, there is a finite number of finite sets of tuples 

. . . , (the possible extensions of R) such that T contains the axiom: 

Vl<fc<n (vx (i?(x) ^ VdGiSi ^ 

Reiter proves that any partial relational database can be uniquely represented by a 
generalized relational theory. The generalization to the case of multiple partial databases 
models each of them as a generalized relational theory, and “coordinates” them using 
an appropriate coordination formula which axiomatizes the domain relation. 

Definition 12 (Domain relation extension). Let Xij be a domain relation. The set of 
coordination formulas for the extension of is a the set Rij that contains the coor- 
dination formula 3j : x.(i : X = d A j \ x = d') if d' G rij(d), and the coordination 
formula Wi : x.(i : x = d j : x d') if d' ^ rij(d). 
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Theorem 2 (Characterization of domain relations). Let Rij be the set of coordination 
formulas for the extension of rij. For any relational space {db,r') with dbi and dbj 
different from the empty set, {db, r') ^ Rij if and only if Tij = r[j. 

Theorem|2| states that, when dbi and dbj are consistent databases, the only domain 
relation from i to j that satisfies the coordination formulas for the extension of (i.e., 
Rij) is rij itself. This means that Rij uniquely characterizes The characterization 
of a relational space (TheoremOl) is obtained by composing the characterization of local 
databases (Reiter’s result) and the characterization of the domain relation (TheoremQ). 
A corollary of the relational space’s characterization (CorollaryQ) provides a character- 
ization in terms of logical consequence of a global answer to a i-query. 

Definition 13 (Relational multi-context system). A relational multi-context system 
for a family of relational languages {Li} is a pair {T, R), where T is a function that 
associates to each i, a generalized relational theory Ti on the language Li, and R is 
a set that contains all the coordination formulas for the extension of a domain relation 
from i to j for any i,j G I. 

Theorem 3 (Representation of relational spaces). For any relational multi-context 
system {T, R) there is a unique (up to isomorphism) relational space (db,r), with the 
following properties: 

1. (db,r) \= i : Ti and (db,r) ^ R. 

2. For each i G I, dbi is different from the empty set. 

3. (db,r) is maximal, i.e., for any other relational space l^db' .,r''), satisfying condition 
1 and 2, db'i C dbi, tind rij = r[j for all i,j G I. 

Vice-versa, for any relational space {db, r), there is a relational multi-context system 
{T, R) such that the maximal model of{T, R) is {db, r). We say that {T, R) is the multi- 
context system that represents {db,r). 

Corollary 1 (Semantic characterization of queries). Let (T, R) be the relational 
multi-context system that represents the relational space {db,r). for any i-query 
q := A(x) ^ i : g(x), the n-tuple d belongs to the global answer ofq, if and only if 

[i : Tjig/, R\=3i : x(A(x) A i : x = d) 

Corollary 0 provides us with the basis for a correct and complete implementation of a 
query answering mechanism in a P2P environment. 



6 Related Work 

The formalism presented in this paper is an extension of the Distributed First Order 
Logics formalism proposed in |ji) . The main improvements concern the language of the 
coordination formulas, their semantics and the calculus. In p|| indeed, relation between 
databases were expressed via domain constraints and interpretation constraints. These 
latter correspond to particular coordination formulas: namely domain constraints from 
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i to j corresponds to the coordination formulas Vi : x3j : yi : x = y and Vj : x3i : yi : 
X = y, while interpretation constraints can be translated in the coordination formulas 
Vi : x.(i : ^(x) ^ j : ipi'X-))- This limitation on the expressive power, does not allow 
to express in DFOL the fact that a table, say p, of a database i is the union of two tables, 
say Pi and p 2 of two different databases j and k. This constraint can be easily expressed 
by the following coordination formula; 

Vi : x.{p{x) j : Pi{x) V k : P 2 {x)) 



As far as the query language is concerned, our approach is similar in some ways to view- 
based data integration techniques, in the following sense. The process of translating a 
query against a local database into queries against an acquaintance would be driven by the 
coordination formulas that relate those two databases. If one thinks of our coordination 
formulas as view definitions, then the translation process is comparable to ones used for 
rewriting queries based view definitions in the local-as-view (LAV) and global-as-view 
(GAV) approaches ([^JEJ. Although standard approaches cannot be applied directly to 
LRM, due to our use of domain relations and context-dependent coordination formulas, 
we expect it is possible to modify LAV/GAV query processing strategies for LRM. For 
example, one could define a sublanguage of LRM whose power is comparable to a 
tractable view definition language used for LAV/GAV query processing. One could then 
apply a modified LAV/GAV algorithm to that language. Or perhaps one could translate 
formulas and queries from the LRM sublanguage into a non-LRM (e.g. , a Datalog dialect) 
and apply a conventional LAV/GAV query processing algorithm. If such a translation 
of formulas and queries proves to be feasible, then it would be important to compare 
the LRM notation to its translation in the non-LRM language, for example to determine 
their relative clarity and compactness. 

Finally our approach provide a general theoretical reference fram ework where man y 
forms of inter-schema constraints defined in the literature, such as |jinl1 1l19|8lrtl14|] 
For lack of space we briefly show only one case. Consider for instance directional 
existence dependences defined in Let Ti[Xi,Yi] and T 2 [X 2 , Y 2 ] be two tables of 
a source database (let’s say 1), and that T[Ci, C 2 , C 3 ] is a table of the target database 
(let say 2). An example of directional existence dependence is: 



T.(Ci, C2) ^ select Xi,X2 from Ti, T2 where Ti.Xi < T2.X2 (8) 

The informal semantics of (0 is that for each tuple of value (Vi, V 2 ) produced by the 
RHS select statement, there is a tuple t in table T such that t projected on columns 
Cl, C 2 has the value {V\, ¥ 2 ). The existence dependence (jil), can be rewritten in terms 
of coordination formulas as 



VI : 0:13:2(1 : 3 yiy 2 {Ti[xi,yi) A T 2 {x 2 ,y 2 ) A Xi < X2) 

32 : CiC2(l : Xi = Cl A X2 = C2 A 2 : 3c3.T(ci, C2, C3))) 



( 9 ) 



When the domain relation are 



identity functions, (P) capture the intuitive reading of j 



7 Conclusion 

We have argued that emerging computing paradigms, such as P2P computing, call for 
new data management mechanisms which do away with the global schema assump- 
tion inherent in current data models. Moreover, in a P2P setting the emphasis is on 
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coordinating databases, rather than integrating them. This coordination is defined by an 
evolving set of coordination formulas which are used both for constraint enforcement 
and query processing. To meet these challenges, the paper proposes, the paper proposes 
the local relational model, LRM, where the data to be managed constitute a relational 
space, conceived as a collection of local databases inter-related through coordination 
formulas and domain relations. The main result of the paper is to define a model theory 
for the LRM. We use this semantics to generalize an earlier result due to Reiter which 
characterizes a relational space as a multi-context system. The results of this paper offer 
a sound springboard in launching a study of implementation techniques for the LRM, 
its query processing and constraint enforcement. 
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Abstract. Discussions in philosophy of language, semantics, and pragmatics, 
often make crucial use of the notion of what is said. It is held that in order to 
account for our intuitions on what is said, we need a distinguished semantic 
level. A tripartite distinction is made among what the sentence means 
independently from the context of utterance, what it means (or “says”) within 
the context, and what the speaker means (or “conveys”). I will challenge the 
need for that intermediate level of meaning, and argue that the enterprise of 
drawing a neat distinction between meaning and what is said is pretty hopeless. 
My main point is that our intuitions on what is said cannot be detached from the 
ways in which we talk about it, and from the semantics of speech-reports and 
attitude-reports in general. 



1 Stirring Up Our Intuitions on What Is Said 

What we say and what others say are things that undeniably play important roles in 
our lives. People get arrested for what they say, friendships break or come about 
because of what someone has said, and so on. There is little doubt that we have a 
certain intuitive notion of what is said, and attempts to account for it should be 
welcomed. However, this supposedly intuitive notion of what is said has also been 
used to draw the line between semantics and pragmatics, and to ground some 
substantial claims about semantics. It is widely held that, given some basic facts about 
the context in which a sentence is uttered, semantics, helped by the syntax of the 
sentence, allows us to figure out what is said by the utterance, which is what provides 
the utterance with a truth value, given further facts about the world. On the other 
hand, what the speaker is trying to achieve by means of the utterance most often goes 
beyond the reach of semantics. We need further facts about the context, concerning 
the speaker’s beliefs and intentions, to figure out what is conveyed by the utterance. 
And the latter, it is held, is the realm of pragmatics.0 

So far, so good. For, given any utterance u, the distinction is being made between 
the ‘semantic’ level of meaning, which may still be thought of as context-invariant 
and possessed by u in virtue of the sentence alone of which u is an utterance, and the 
‘pragmatic’ level of the action performed by means of m. 0 Suppose that I say “It’s 

* I may be giving a caricature of the received wisdom, but the distinction drawn along these 
lines may he found all over the place. See e.g. [2], [7], or [10]. 

^ Of course, what u, qua a mere string of sounds, has as its linguistic meaning, said to be 
‘context-invariant,’ itself depends upon the context, since the context provides the language 
in which to interpret u. But once the language has been fixed, variations on other contextual 
features are not going to alter the meaning of u. 

P. Blackburn et al. (Eds.): CONTEXT 2003, LNAI 2680, pp. 300-313, 2003. 
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warm in here.” Your linguistic knowledge, assuming that you are a competent English 
speaker, allows you to grasp the meaning of my sentence, and, tentatively, what I say. 
Yet, depending on the circumstances of my utterance, you may well start wondering 
what I mean. Perhaps I mean to ask you, indirectly, to open the windows. Perhaps I 
have just opened them, so I let you know why. Perhaps the windows are open, and I 
want you to close them and turn on the air-conditioning instead. Perhaps the heating is 
out of work, the temperature has dropped below zero, and I am just being ironic. 
Which among all those things I happen to be doing with my utterance is to be settled 
by considerations of the circumstances in which I made it, taking into account what I 
want, what I think, what I think that you think, and so on. 

However, with this example already, we have hit upon a controversy. For, you 
could question my claim that your knowledge of English allows you to grasp what I 
said. Don’t you need to know as well where I was when I said “It’s warm in here”l If 
I say this in your living room, or in a van, or in a disco, will I be saying the same 
thing every time? Or will I be saying different things? This is, roughly, the point- 
break where our intuitions lose force and you can pull them one way or the other. 
Thus, I may insist, “yes, I am saying the same thing every time, namely, that it is 
warm in there.” And you may either buy this, or protest, “no, you are saying different 
things. First, you say that it’s warm in my living room. Then you say that it’s warm in 
the van. Last, you say that it’s warm in the disco.” But now, how are we supposed to 
settle the question of what it is that I said? 

Suppose, for the sake of the argument, that I agree with you on this much: when I 
say “It’s warm in here” in your living room, I am saying that it is warm in your living 
room. Now, suppose that I say it at 6 p.m. I am warm, I would like you to open the 
windows, but you have not paid attention to what I said. So I repeat it ten minutes 
later, that is, at 6.10 p.m. The question becomes: do I now say the same thing as I did 
the first time, or do I say something different? Once again, I may insist that, yes, I 
said the same thing, namely that it was warm in there (in your living room), while you 
may disagree and say that what I first said is that it was warm in your living room as 
of 6 p.m., and what I second said is that it was warm in your living room as of 6.10 
p.m. But who is to say whether you are right, or me, or neither, or both? Is there any 
matter of fact as to what it is that I said? 

How do we identify, or individuate, what has been said? How do we decide when 
the relation of saying the same thing holds between any two utterances? There is a 
well-established tradition, from Frege, via Kaplan, to most contemporaries, to say that 
for every utterance there is a semantically relevant level of what is said, or content, or 
the proposition expressed, dependent on the context and distinct from the linguistic 
meaning of the sentence uttered. Contents are supposed to be specifiable beforehand, 
in the sense that there is a determinate method to figure out what the content is, given 
the sentence and the basic facts about the context of its utterance, like who the 
speaker is or what the time is. My aim is to argue against this tradition. I will point 
out a whole pattern of cases that seriously threaten the idea of isolating some 
determinate level of what is said, independently from any context in which the 
question of what was said has been raised. 
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2 Different Meanings, Different Contents, the Same Thing Said 

I start with a few cases that motivate the distinction between meanings and contents. I 
then offer a few more cases that dis-motivate the distinction, so to speak. The lesson 
to draw is that our intuitions on what is said, on which the traditional view heavily 
relies, do not support any such neat distinction. 



2.1 How You May Be Tempted to Appeal to Contents 

The linguistic meaning of a sentence is a natural candidate to stand for what is said by 
an utterance of the sentence. But this suggestion has been widely rejected. Frege 
already wrote; “The sentence T am cold’ expresses a differ^t thought in the mouth of 
one person from what it expresses in the mouth of another.’*^ 

Suppose that you utter the following sentence twice, respectively in reference to 
Laura Bush and to Hilary Clinton: 

She is arrogant. (1) 

Your first utterance of (1) attributes arrogance to Mrs. Bush, whereas in your second 
utterance, you are saying of Mrs. Clinton that she is arrogant. The intuition that you 
have not said the same thing is supported by the fact that the truth of your first 
utterance depends on Laura Bush, while the truth of your second utterance depends on 
Hilary Clinton. It seems, then, that there may be something in what is said that does 
not come from the linguistic meaning of the words uttered. The two women seem to 
have gotten somehow into what you said on the two occasions. Certain items 
furnished by the context, like the person in reference to whom a personal pronoun has 
been used, seem necessary to the understanding of the utterance. 

But if we are to take what is said to be simply the meaning plus whatever is 
required to understand the utterance, then it should work equally well to take what is 
said to be simply the meaning. The intuitive difference between your two utterances 
of (1) would be explained by the fact that you said the same thing o/ different people 
- first of Laura Bush, then of Hilary Clinton. 

As this first case did not take us very far, suppose that I say to Laura Bush: 

Y ou are arrogant. (2) 

Suppose that a guy called John, who happens to be standing nearby, mistakenly thinks 
that I was talking to him. Suppose also that I do not think that he is arrogant, nor does 
he think that he is arrogant, nor does Laura Bush think that she is arrogant. Intuitively, 



^ [1], p. 235. A “thought” in Frege’s terminology is “what is said” in modern terminology. 
Kaplan’s rejection of the suggestion at stake is even more explicit: “What is said in using a 
given indexical in different contexts may be different. Thus if I say, today, “I was insulted 
yesterday,” and you utter the same words tomorrow, what is said is different. (...) There are 
possible circumstances in which what I said would be tme but what you said would be false. 
Thus we say different things.” [2], p. 500. 
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I disagree with Laura Bush, while I do not really disagree with John: there has simply 
been some misunderstanding between the two of usP 

If all aspects of the meaning were preserved in what is said, then what I said in (2) 
would be something like “x is arrogant and x is being talked to,” and I would be 
saying it of Laura Bush. But then John and I would really disagree about the truth of 
what I said, since he thinks that it is him, and not Laura Bush, to whom I was talking. 
This clearly clashes with the intuition that he simply did not get what I said. In the 
light of this second case, it seems that some aspects of the meaning do not reach into 
what is said. The property of being talked to, carried by “you”, seems to drop off what 
is said.^ 

Notwithstanding appearances, this case does not take us very far either. Consider it 
once again. It rests on the assumption that everything that can be evaluated for truth 
and is part of what is said is also something about which people may disagree. Now, 
the question of who is being talked to is not really open to disagreement, which 
further suggests that certain parts of the meaning do not reach to what is said. 
However, one may well question this suggestion, while preserving the intuition that it 
is not quite appropriate to talk of disagreement in the case of (2). John and I have 
conflicting opinions as to whether Laura Bush is being talked to or not. But how 
could I possibly go wrong on the issue of whom I was talking to, John or Mrs. Bush? 
As the speaker of (2), I am the best placed to know to whom I was talking. It is then 
inappropriate to say that John disagrees with me on something about which he knows 
that I cannot go wrong. I take it, therefore, that (2) no longer motivates having 
anything beyond meanings to stand for what is said. 

So far, I have dealt away with two intuition-based arguments for a distinguished 
level of what is said. I now turn to the argument from same-saying, as Perry calls it.^ 
It relies on utterances that intuitively say the same thing in spite of differences in 
meaning. Consider (1) and (2) together. Their meanings are clearly different, since 
“she” is to be used for the most salient female individual, while “you” is to be used 
for the addressee. Yet, it seems that what you say in (1), talking of Laura Bush, is the 
same thing as what I say in (2), something like that Laura Bush is arrogant. We both 
say of her that she is arrogant, only, I dare tell her straight. The insight again goes 
back to Frege, who further wrote: “It is not necessary that the person who feels cold 
should himself give utterance to the thought that he feels cold. Another person can do 
this by using a name to designate the one who feels cold.” 



2.2 A Dilemma 

Suppose that your utterance of (1), in reference to Laura Bush, was made in June 
2002. For the sake of the argument, assume that Laura Bush herself is part of what is 



A similar case is given by Stalnaker, who notes: “What one says (...) is itself something that 
might have been different if the facts had been different; and if one is mistaken about the 
truth value of an utterance, this is sometimes to he explained as a misunderstanding of what 
was said rather than as a mistake about the tmth value of what was actually said.” ([8] p. 
279). 

^ As Recanati puts it: “the property of being the addressee is not a constituent of the 
proposition expressed [hy an utterance in which ‘you’ occurs]: it is used only to help the 
hearer identify the reference, which is a constituent of the proposition expressed.” ([7], p. 39) 
" [6],p.5. 
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said. Now, is the content of (1), what you said, temporally neutral? Does (1) merely 
attribute her arrogance, a property that she could possess at certain times and lack at 
others? Or is that content temporally specific? That is to say, does (1) attribute her the 
property of arrogance as of the time picked out by the present tense, June 2002? 

How is this issue supposed to be settled, first of all? Presumably, we ought to 
consult our own intuitions on what is said. Suppose, then, that you utter the same 
sentence again in January 2003, referring once more to Laura Bush. You say: 

She is arrogant. (3) 

Will you say the same thing as you did in (1)? I can certainly reply, “hey, that is 
precisely what you said last June.” And our judgments about the truth value of such a 
reply are that, in a suitable context, it comes out true. So, there is at least a sense in 
which what is said in (1) and what is said in (3) are one and the same thing. That is 
not sufficient evidence yet that the contents must be temporally neutral. For, suppose 
that you hold what seems to be the standard view, namely, that what is said must be a 
proposition, whose truth does not vary with time. The feeling that (1) and (3) in some 
sense say the same thing can be explained by their being utterances of one and the 
same sentence. More generally, the standard view will always have a choice between 
the context-independent meaning and the context-dependent content to account for 
what is said. 

The dilemma is whether to think of contents as temporally neutral or temporally 
specific. The ‘neutral’ horn is motivated by the intuition that the same thing may be 
said by utterances that do not express the same proposition, as in the case of (1) and 
(3). The motivation is not conclusive, and is counterbalanced by the intuition that, if 
you were asked in January 2003 to repeat what you said in June 2002, you might well 
say something like: 

In June 2002, Laura Bush was arrogant. (4) 

There is little doubt that in a suitable context, on the basis of your utterance of (1), I 
can truly reply, “indeed, that is precisely what you said in June 2002.” And if the 
same thing has been said by (1) and (4), it had better be temporally specific. 

With the dilemma in mind, let us go back to (1) and (2) jointly considered. As 
before, (1) was uttered in reference to Laura Bush in June 2002. Suppose that I uttered 
(2) in January 2003, talking directly to Mrs. Bush. Your utterance of (1) and my 
utterance of (2) intuitively say the same thing. Or, more modestly, there is a sense of 
saying the same thing in which we are doing so, since both of us are attributing 
arrogance to Laura Bush.^ 

But here comes a problem. The pair of ( 1 ) and (4) strongly supports the idea that if 
there is a separate level of content to stand for what is said, it must consist of 
temporally specific contents. On the other hand, the pair of (1) and (2) strongly 
supports the idea that those contents must be temporally neutral. For, if what is said is 
something whose truth may vary with time, then (1) becomes a mere attribution of 
arrogance to Laura Bush, and so does (2), hence they “say the same thing.” But if 
what is said is something whose truth cannot vary with time, then what you said in (1) 



7 



Besides, note that our initial discussion of the intuition that (1) and (2) say the same thing 
was free of any assumptions about the times of our utterances. 
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will consist of her arrogance as of June 2002, while what I said in (2) will consist of 
her arrogance as of January 2003. If you opt for temporally specific contents, 
motivated, among other things, by (1) and (4), how can you account for the intuition 
that what you said in (1) is the same as what I said in (2)? You can no longer appeal 
to the sameness of the sentences uttered, or even to the sameness of meanings of the 
sentences uttered, given that “she” and “you,” hence our sentences themselves, do not 
mean the same thing. 

We have a dilemma, then, whose both horns are likely to leave us unhappy. But 
this is not the problem yet. For, one way of resolving the dilemma is to allow for a 
minor modification on the side of the ‘neutral’ horn. In rough lines, one might say that 
temporally specific contents could be subsumed under temporally neutral contents. 
Specific contents are a particular case of neutral contents, in the same way in which 
constant functions are a particular case of functions: they yield the same truth value, 
whatever temporal input you feed in. The next step, one might say, is to spot an 
ambiguity in any present-tensed sentence. Consider again (1), as uttered in June 2002. 
On the one reading, it expresses a temporally neutral content, namely, Laura Bush’s 
arrogance, which obtains at some times and not at others, as her personality changes 
through time. On the other reading, it expresses a temporally specific content, namely, 
Laura Bush’s arrogance as of June 2002. At last, when we get the intuition that (1) 
and (4) say the same thing, it must mean that (1) has been given a specific reading, 
while when we get the intuition that (1) and (2) say the same thing, both (1) and (2) 
assume neutral readings. 

This way of resolving the dilemma is still unlikely to make us happy. For, it 
presupposes that the sentences uttered in (1) and (2) are ambiguous, which is very 
implausible. More plausibly, one could say that the sentences are not ambiguous, but 
unambiguously express two contents each: one neutral, one specific. The neutral 
content accounts for what is said by (1) in one sense (the temporally neutral sense), 
the specific content accounts for what is said by (1) in another sense (the temporally 
specific sense). So we can be happy now - if only /or now. 



2.3 The Problem 

The argument from same-saying is a double-bladed sword. It was designed to defend 
the traditional view, but is can be equally well turned against it. In order to handle our 
intuitions on what is said, without giving up the very idea that there is some 
distinguished level of what is said, one seems forced into a position that differes from 
the received one on two crucial aspects: the notion of content becomes more flexible, 
since contents may change truth value through time, and given any utterance, instead 
of there being one content to stand for what is said, there may be several. 

So far, the situation does not appear dramatic, since we have only inquired how 
time affects what is said. But the cases that pose problems for the traditional view, far 
from being confined to the issue of time, are pervasive. Consider the following 
scenario. In June 2002, it was particularly warm in San Francisco, which we both 
know. In July the same year, we happen to be in Chicago, and it is very warm. I say: 

It’s very warm, probably warmer than in San Francisco last month. (5) 
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Later, in September, we are in Stanford, and the weather is again very warm. I say: 

It’s very warm, probably warmer than in the city last June. (6) 

If my aim in making those utterances is simply to comment on the weather, then it 
should not be too hard to get the intuition that I am saying the same thing. What I am 
saying is, simply, that it’s very warm, warmer than in June 2002 in San Francisco— 
But the traditional view has no handle on such cases. The propositions expressed by 
(5) and (6) are different: one is true only if it is warmer in Chicago in July 2002 than 
in San Francisco in June, while the other is true only if it is warmer in Stanford in 
September than in San Francisco in June. And the linguistic meanings of the 
sentences are obviously different. Moreover, it is no longer enough to appeal to 
temporally neutral contents. We must allow for locationally neutral contents, too. 

By way of a bonus example, suppose that we go with a friend to see Almodovar’s 
latest movie. Talk to Her. Coming out of the movie theatre, I ask you “How did you 
like it?” Later on, I turn to our friend and I ask her “How did you like the movie?” 
The linguistic meanings of the questions that I asked are different (you may use the 
pronoun “it” for anything you wish, while you may use the description “the movie” 
only for the movies). At the same time, we are inclined to say that I asked the same 
question. I asked our friend what I asked you, namely, how she liked the movie in 
question, Talk to Her. If after a while I come upon someone who I know has seen 
Talk to Her, and I ask him “How did you like Talk to Her,” I will be asking the same 
question again, namely, how he liked that movie. And if in a conversation about 
Almodovar someone asks me “How did you like his latest movie,” then I will be 
asked the very question that I was previously asking, namely, how I liked the movie 
Talk to Her. 

The case clearly poses a problem for the traditional view. Propositionally, the four 
questions are different: the first is how you liked Talk to Her, the second, how our 
friend liked it, the third, how that guy that I came upon liked the same movie, and the 
fourth, how I liked it. Temporally and locationally neutral contents are of little help. 
What you need is something like contents neutral with respect to the addressee. And it 
should take you little to think of a case that would call for contents neutral with 
respect to the speaker. But once you start making room for various sorts of contents, 
neutral with respect to various sorts of things, then the dichotomy between meaning 
and what is said is clearly lost. 

Let us see what has been done so far. We were first led to accommodate contents 
whose truth varies with time. But once we allowed for temporally neutral contents, we 
realized that we could not limit ourselves to time. We had to allow for contents whose 
truth varies with locations, or, worse, with addressees. The problem is that we are 
likely to have to allow for contents neutral with respect to all sorts of things, contents 
whose truth may vary not only with places and people, but also with points of views, 
time zones, or situations in general. We are likely to have to allow for contents whose 
truth may vary with the context, contents that assume their truth value only relative to 
a context. But wait! What becomes of the difference, then, between meanings and 
contents? Sure, if you define meanings as functions from contexts to contents, the way 



It might be worth making it clear that when in Stanford one talks of “the city,” one means 
San Francisco. 



